Welcome to the SHARP Multi-omics Workshop. The goal of this workshop is to explore statistical methods for the analysis of multi-omic (or multi-view or multi-layer) data in observational studies. From this perspective, many population based or observational studies supplement a primary goal of investigating a risk factor on an outcome with additional omic data to better characterize the risk factors (e.g. germline genetics, exposomics), provide measurements for intermediate variables (e.g. transcriptomics, proteomics, metabolomics, and the microbiome), and/or to define a specific outcome of interest such as a single or multiple biomarkers. While all omic measurements often share a ‘high-dimensional” aspect, the different omic ‘dimensions’ can vary extensively in their scale of measurement, correlation structure, and strength and proportion of associations. In this context, the investigator is often confronted with an analytic decision between simplicity and complexity. Simple approaches often treat sets of variables in a pairwise independent manner sacrificing joint evaluation for benefits in interpretability. Complex methods often model joint correlation structures, but can sacrifice ease of interpretation.

Conceptually, multi-omic data can be integrated following several philosophical approaches as summerized in Picard et al. 2021:

  • 1. Early Integration

    • This concatenates every omic layer into a single large matrix with subsequent analysis methods applied to this single matrix.
  • 2. Mixed Integration

    • Transforms each omic layer into a simpler representation with subsequent analysis methods applied to these simpler or processed features.
  • 3. Intermediate Integration

    • Any approach that jointly integrates the multiple omic layers without pre-processing each layer.
  • 4. Late Integration

    • Approaches that apply association analysis to each omic layer and features within and then “post-process” the results.
  • Dimensional reduction Within each of these type of approaches there is often the need for reducing the number of variables for analysis for computational efficiency, reducing the statistical noise, or to identify underlying latent structures/clusters that characterize patterns in the variables. Such reduction techniques generally fall into two main types of approaches.

    • Feature selection: determining a smaller set of features that keep most of the relevant information. Example includes machine learning or regularized regression.
    • Feature extraction or clustering or latent estimation: transformation of the native features into a reduced set of variables that “capture” similar information. Example includes principal component analysis or k-means clustering.

Since most observational studies use association analysis as the bedrock for inference, in this workshop, we will build from the basic association framework and discuss extensions for integrated multi-omic analysis - always focusing on how integration strategies can by used to then investigate the subsequent role of the omic layer or specific feature on a outcome of interest. Accordingly the workshop focuses on expanding or integrating multi-omic data into an association frameworks.

For example, integrated analysis that utilizes a general mediation framework and ideas of dimension reduction are illustrated in the above figure, with each element of the grid indicating a potential analysis approach. For example, in “Early Integration with High Dimensional Data” (box A) the multiple omics layers are concatenated into a single omics matrix. Then, within a high dimensional mediation framework utilizing feature selection, features from all layers are selected accounting for each omic layer or type within a single mediation model. As an alternative, “Late Integration with High Dimensional Data” (box C), represents an approach that models each omic layer with a separate high dimensional mediation model for feature selection. Results from each layer can then be aggregated or evaluated in a post-hoc integrated analysis or interpretation. Alternative approaches can also be implemented that utilize feature extraction or clustering in concert with either “early” or “late” integration. For example, in “Early Integration with Latent Factors” (box D), the multiple omics layers are first concatenated into a single omics matrix and then a feature extraction/clustering/latent estimation procedure is performed on all features from all omics layers. Resulting clusters are then used in downstream mediation analysis for inference for associaiton to the outcome. Similarly, in “Late Integration with Laten Factors” (box F), the feature extraction/clustering/latent estimation is first performed on each omic layer followed by downstream mediation analysis. Each omic layer is treated independently and results fro each analysis are integrated in a post hoc framework.

To better understand the elements of each type of approach, for the workshop we will discuss the following:

A. Polygenic models and the use of genetic summary statistics data: As an extension to GWAS studies, these analysis techniques look to combine data into a single risk score (polygenic risk) or use genetic summary statistics from 1) the association of SNPs to an outcome; and 2) from the association of SNPs to a intermediate (often high dimensional omic data) to then test the association of the intermediate to the outcome.

B. Interaction analysis: Genomewide interaction analysis that focus often on a single risk factor and how it interacts with genomewide SNP data.

C. Clustering: With omic data clustering often serves as a key analytic technique within the analysis pipeline. This includes: 1) an initial step of dimension reduction or exploration of a single omic layer or multiple omic layers for downstream association analyses; or 2) the post-processing of high dimensional results from pairwise association analyses of omic data.

D. Mediation: To remain connected to the original biological hypothesis that often guides a study, mediation analysis strives to link the relationships between three sets of variables: 1) the risk factors; 2) the mediators or intermediates; and 3) the outcome. Omic data can be measured for each type of variable (most often on the risk factors and/or the intermediates) and high dimensional mediation techniques (including the incorproation of clustering or latent estimation) can be used for analysis.

Overall, we focus on statistical analyses for association testing with multi-omic data. We will not focus on the “lab-based” methods and techniques for measuring each type of omic data set or the omic-specific quality control or processing required and crucial for successful evaluation and use of omic data. We feel that there are ample training opportunities available that describe the details of these analyses for each type of omic data.

To facilitate the sessions and topics covered during the workshop we have created a pre-workshop lab. The idea of this pre-workshop lab is to provide a self guided tour to familiarize you with the data and basic statistical analyses that will serve as the foundation for content presented in the SHARP Multi-omics Workshop.

Each section consists of a R markdown file (.Rmd) and an .html file. The html file can be opened via a web browser and provides a formatted version to go through the presented material. At each stage, code can be revealed by clicking on the “code” button. As an alternative, the .Rmd files can be opened within R Studio and each code chunk can be run to explore the analysis in detail. In addition, the .Rmd files (include this file “PreworkshopLab.Rmd” can be “knitted” to create the html file by clicking the “knit” button in Rstudio).

The content in this pre-workshop will be discussed within the first session of the workshop to provide more background and context.

1. Data Description

Data Overview

This describes the data that will be used for many of the labs throughout the workshop. We also present an example on how to construct a MultiAssayExperiment object - an R object for storing multi-view or multiple omic data sets measured on the same individuals.

Exposome Data Challenge

The data is from the Exposome Data Analysis Challenge (https://www.isglobal.org/-/exposome-data-analysis-challenge). The Exposome dataset represents a real case scenario of an exposome dataset (based on the HELIX project database) with multiple correlated variables (N>100 exposure variables) arising from general and personal environments at different time points, biological molecular data (multi-omics: DNA methylation, gene expression, proteins, metabolomics, exposome) and multiple clinical phenotypes. The population is drawn from a multi-center study which will results in one of the main confounding structures in the dataset.

In addition, for the SHARP Multiomics Workshop, we simulated a germline genetics example dataset.

The HELIX study

The HELIX study represents a collaborative project across six established and ongoing longitudinal population-based birth cohort studies in six European countries (France, Greece, Lithuania, Norway, Spain, and the United Kingdom). HELIX used a multilevel study design with the entire study population totaling 31,472 mother–child pairs, recruited during pregnancy, in the six existing cohorts (first level); a subcohort of 1301 mother-child pairs where biomarkers, omics signatures and child health outcomes were measured at age 6-11 years (second level); and repeat-sampling panel studies with around 150 children and 150 pregnant women aimed at collecting personal exposure data (third level). For more details on the study design see Vrijheid, Slama, et al. EHP 2014. see https://www.projecthelix.eu/index.php/es/data-inventory for more information regarding the study.

Data Processing and Organization

load(paste0(work.dir, "/Data/exposome.RData"))
load(paste0(work.dir, "/Data/proteome.RData"))
load(paste0(work.dir, "/Data/genome.RData"))
load(paste0(work.dir, "/Data/metabol_serum.RData"))
load(paste0(work.dir, "/Data/metabol_urine.RData"))

outdoor.exposures <- exposome[,c("ID", as.character(codebook$variable_name[codebook$domain=="Outdoor exposures"]))] %>% 
  column_to_rownames("ID") %>% 
  t() %>%
  DataFrame()
indoor.air <- exposome[,c("ID", as.character(codebook$variable_name[codebook$domain=="Indoor air"]))] %>% 
  column_to_rownames("ID") %>% 
  t() %>%
  DataFrame()
lifestyles <- exposome[,c("ID", as.character(codebook$variable_name[codebook$domain=="Lifestyles"]))] %>% 
  column_to_rownames("ID") %>% 
  t() %>%
  DataFrame()
chemicals <- exposome[,c("ID", as.character(codebook$variable_name[codebook$domain=="Chemicals"]))] %>% 
  column_to_rownames("ID") %>% 
  t() %>%
  DataFrame()
covariates <- covariates %>% 
  column_to_rownames("ID") %>% 
  t() %>%
  DataFrame()
phenotype <- phenotype %>% as.data.frame() # use as ColData for MultiAssayExperiment format
row.names(phenotype) <- paste0("X", phenotype$ID)

proteome.d <- proteome@assayData$exprs %>% DataFrame()
proteome.cov <- proteome@phenoData@data
proteome.cov <- proteome.cov[stats::complete.cases(proteome.cov),] %>% t() %>% DataFrame()

metabol_urine.d <- metabol_urine@assayData$exprs %>% DataFrame()
metabol_urine.cov <- metabol_urine@phenoData@data
metabol_urine.cov <- metabol_urine.cov[stats::complete.cases(metabol_urine.cov),] %>% t() %>% DataFrame()

metabol_serum.d <- metabol_serum@assayData$exprs %>% DataFrame()
metabol_serum.cov <- metabol_serum@phenoData@data
metabol_serum.cov <- metabol_serum.cov[stats::complete.cases(metabol_serum.cov),] %>% t() %>% DataFrame()

# note that we do not include the gene expression nor the methylation data in the MultiAssayExperiment object as they are large. We also don't recommend storing genomewide data in this format. However, we include a small (e.g. 1000 SNPs) "genome" germline genetics data as an example.
helix_ma <- MultiAssayExperiment(
  experiments= ExperimentList("outdoor.exposures"=outdoor.exposures,
                              "indoor.air"=indoor.air,
                              "lifestyles"=lifestyles,
                              "exposome"=chemicals,
                              "covariates"=covariates,
                              "proteome"=proteome.d,
                              "proteome.cov"=proteome.cov,
                              "metabol_urine"=metabol_urine.d,
                              "metabol_urine.cov"=metabol_urine.cov,
                              "metabol_serum"=metabol_serum.d,
                              "metabol_serum.cov"=metabol_serum.cov,
                              "genome"=G), 
  colData = phenotype)

# clean up after creating MultiAssayExperiment data object
rm(outdoor.exposures)
rm(indoor.air)
rm(lifestyles)
rm(chemicals)
rm(covariates)
rm(proteome.d)
rm(proteome.cov)
rm(metabol_urine.d)
rm(metabol_urine.cov)
rm(metabol_serum.d)
rm(metabol_serum.cov)
rm(G)
#save(paste0(work.dir, "/Data/HELIX.MultiAssayExperiment.RData")) # code to save if needed

Codebook for Exposures, Covariates, and Phenotypes

kable(codebook, align="c")
variable_name domain family subfamily period location period_postnatal description var_type transformation labels labelsshort
h_abs_ratio_preg_Log h_abs_ratio_preg_Log Outdoor exposures Air Pollution PMAbsorbance Pregnancy Home NA abs value (extrapolated back in time using ratio method)duringpregnancy numeric Natural Logarithm PMabs PMabs
h_no2_ratio_preg_Log h_no2_ratio_preg_Log Outdoor exposures Air Pollution NO2 Pregnancy Home NA no2 value (extrapolated back in time using ratio method)during pregnancy numeric Natural Logarithm NO2 NO2
h_pm10_ratio_preg_None h_pm10_ratio_preg_None Outdoor exposures Air Pollution PM10 Pregnancy Home NA pm10 value (extrapolated back in time using ratio method)duringpregnancy numeric None PM10 PM10
h_pm25_ratio_preg_None h_pm25_ratio_preg_None Outdoor exposures Air Pollution PM2.5 Pregnancy Home NA pm25 value (extrapolated back in time using ratio method)duringpregnancy numeric None PM2.5 PM2.5
hs_no2_dy_hs_h_Log hs_no2_dy_hs_h_Log Outdoor exposures Air Pollution NO2 Postnatal Home Day before examination no2 value (extrapolated back in time using ratio method)one day before hs test at home numeric Natural Logarithm NO2(day) NO2(day)
hs_no2_wk_hs_h_Log hs_no2_wk_hs_h_Log Outdoor exposures Air Pollution NO2 Postnatal Home Week before examination no2 value (extrapolated back in time using ratio method)one week before hs test at home numeric Natural Logarithm NO2(week) NO2(week)
hs_no2_yr_hs_h_Log hs_no2_yr_hs_h_Log Outdoor exposures Air Pollution NO2 Postnatal Home Year before examination no2 value (extrapolated back in time using ratio method)one year before hs test at home numeric Natural Logarithm NO2(year) NO2(year)
hs_pm10_dy_hs_h_None hs_pm10_dy_hs_h_None Outdoor exposures Air Pollution PM10 Postnatal Home Day before examination pm10 value (extrapolated back in time using ratio method)one day before hs test at home numeric None PM10(day) PM10(day)
hs_pm10_wk_hs_h_None hs_pm10_wk_hs_h_None Outdoor exposures Air Pollution PM10 Postnatal Home Week before examination pm10 value (extrapolated back in time using ratio method)one week before hs test at home numeric None PM10(week) PM10(week)
hs_pm10_yr_hs_h_None hs_pm10_yr_hs_h_None Outdoor exposures Air Pollution PM10 Postnatal Home Year before examination pm10 value (extrapolated back in time using ratio method)one year before hs test at home numeric None PM10(year) PM10(year)
hs_pm25_dy_hs_h_None hs_pm25_dy_hs_h_None Outdoor exposures Air Pollution PM2.5 Postnatal Home Day before examination pm25 value (extrapolated back in time using ratio method)one day before hs test at home numeric None PM2.5(day) PM2.5(day)
hs_pm25_wk_hs_h_None hs_pm25_wk_hs_h_None Outdoor exposures Air Pollution PM2.5 Postnatal Home Week before examination pm25 value (extrapolated back in time using ratio method)one week before hs test at home numeric None PM2.5(week) PM2.5(week)
hs_pm25_yr_hs_h_None hs_pm25_yr_hs_h_None Outdoor exposures Air Pollution PM2.5 Postnatal Home Year before examination pm25 value (extrapolated back in time using ratio method)one year before hs test at home numeric None PM2.5(year) PM2.5(year)
hs_pm25abs_dy_hs_h_Log hs_pm25abs_dy_hs_h_Log Outdoor exposures Air Pollution PMAbsorbance Postnatal Home Day before examination pm25 absorbance value (extrapolated back in time using ratio method)one day before hs test at home numeric Natural Logarithm PMabs(day) PMabs(day)
hs_pm25abs_wk_hs_h_Log hs_pm25abs_wk_hs_h_Log Outdoor exposures Air Pollution PMAbsorbance Postnatal Home Week before examination pm25 absorbance value (extrapolated back in time using ratio method)one week before hs test at home numeric Natural Logarithm PMabs(week) PMabs(week)
hs_pm25abs_yr_hs_h_Log hs_pm25abs_yr_hs_h_Log Outdoor exposures Air Pollution PMAbsorbance Postnatal Home Year before examination pm25 absorbance value (extrapolated back in time using ratio method)one year before hs test at home numeric Natural Logarithm PMabs(year) PMabs(year)
h_accesslines300_preg_dic0 h_accesslines300_preg_dic0 Outdoor exposures Built environment Access Pregnancy Home NA Meters of public transport mode lines (only buses) inside each 300m buffer, divided by the buffer area in km2at pregnancy period numeric Dichotomous Access_ lines BPTLine
h_accesspoints300_preg_Log h_accesspoints300_preg_Log Outdoor exposures Built environment Access Pregnancy Home NA Number of bus public transport mode stops inside each 300m buffer, divided by the buffer area in km2at pregnancy period numeric Natural Logarithm Access_stops BPTStop
h_builtdens300_preg_Sqrt h_builtdens300_preg_Sqrt Outdoor exposures Built environment Building density Pregnancy Home NA Building density (m2 built/km2) within a buffers of 300mat pregnancy period numeric Square root Building BuildDens
h_connind300_preg_Sqrt h_connind300_preg_Sqrt Outdoor exposures Built environment Connectivity Pregnancy Home NA Connectivity density (number of intersections / km2) within a buffer of 300mat pregnancy period numeric Square root Connectivity Connec
h_fdensity300_preg_Log h_fdensity300_preg_Log Outdoor exposures Built environment Facility Pregnancy Home NA Number of facilities present divided by the area of the 300 meters buffer at pregnancy period numeric Natural Logarithm Facility_dens FacDens
h_frichness300_preg_None h_frichness300_preg_None Outdoor exposures Built environment Facility Pregnancy Home NA Number of different facility types present divided by the maximum potential number of facility types (at a 300m buffer)at pregnancy period numeric None Facility_rich FacRich
h_landuseshan300_preg_None h_landuseshan300_preg_None Outdoor exposures Built environment Land use Pregnancy Home NA Landuse Shannon’s Evenness Indexat pregnancy period numeric None Land use Land use
h_popdens_preg_Sqrt h_popdens_preg_Sqrt Outdoor exposures Built environment Population Pregnancy Home NA population densityat pregnancy period numeric Square root Population Pop
h_walkability_mean_preg_None h_walkability_mean_preg_None Outdoor exposures Built environment Walkability Pregnancy Home NA Walkability index (as mean of deciles of facility richness index, landuse shannon’s Evenness Index, population density, connectivity density)at pregnancy period numeric None Walkability Walkability
hs_accesslines300_h_dic0 hs_accesslines300_h_dic0 Outdoor exposures Built environment Access Postnatal Home NA Meters of public transport mode lines (only buses) inside each 300m buffer, divided by the buffer area in km2at home numeric Dichotomous Access_ lines_home BPTLineH
hs_accesspoints300_h_Log hs_accesspoints300_h_Log Outdoor exposures Built environment Access Postnatal Home NA Number of bus public transport mode stops inside each 300m buffer, divided by the buffer area in km2at home numeric Natural Logarithm Access_stops_home BPTStopH
hs_builtdens300_h_Sqrt hs_builtdens300_h_Sqrt Outdoor exposures Built environment Building density Postnatal Home NA Building density (m2 built/km2) within a buffers of 300mat home numeric Square root Building_home BuildH
hs_connind300_h_Log hs_connind300_h_Log Outdoor exposures Built environment Connectivity Postnatal Home NA Connectivity density (number of intersections / km2) within a buffer of 300mat home numeric Natural Logarithm Connectivity ConnH
hs_fdensity300_h_Log hs_fdensity300_h_Log Outdoor exposures Built environment Facility Postnatal Home NA Number of facilities present divided by the area of the 300 meters buffer at home numeric Natural Logarithm Facility_dens FacDenH
hs_landuseshan300_h_None hs_landuseshan300_h_None Outdoor exposures Built environment Land use Postnatal Home NA Landuse Shannon’s Evenness Indexat home numeric None Land use Land useH
hs_popdens_h_Sqrt hs_popdens_h_Sqrt Outdoor exposures Built environment Population Postnatal Home NA population densityat home numeric Square root Population Population
hs_walkability_mean_h_None hs_walkability_mean_h_None Outdoor exposures Built environment Walkability Postnatal Home NA walkability index (as mean of deciles of facility richness index, landuse shannon’s Evenness Index, population density, connectivity density)at home numeric None Walkability Walkability
hs_accesslines300_s_dic0 hs_accesslines300_s_dic0 Outdoor exposures Built environment Access Postnatal School NA Meters of public transport mode lines (only buses) inside each 300m buffer, divided by the buffer area in km2at school numeric Dichotomous Access_ lines_school BPTLineS
hs_accesspoints300_s_Log hs_accesspoints300_s_Log Outdoor exposures Built environment Access Postnatal School NA Number of bus public transport mode stops inside each 300m buffer, divided by the buffer area in km2at school numeric Natural Logarithm Access_stops_school BPTStopS
hs_builtdens300_s_Sqrt hs_builtdens300_s_Sqrt Outdoor exposures Built environment Building density Postnatal School NA Building density (m2 built/km2) within a buffers of 300mat school numeric Square root Building_school_school BuildS
hs_connind300_s_Log hs_connind300_s_Log Outdoor exposures Built environment Connectivity Postnatal School NA Connectivity density (number of intersections / km2) within a buffer of 300mat school numeric Natural Logarithm Connectivity_school ConnS
hs_fdensity300_s_Log hs_fdensity300_s_Log Outdoor exposures Built environment Facility Postnatal School NA Number of facilities present divided by the area of the 300 meters buffer at school numeric Natural Logarithm Facility_dens_school FacDenS
hs_landuseshan300_s_None hs_landuseshan300_s_None Outdoor exposures Built environment Land use Postnatal School NA Landuse Shannon’s Evenness Indexat school numeric None Land use_school Land useS
hs_popdens_s_Sqrt hs_popdens_s_Sqrt Outdoor exposures Built environment Population Postnatal School NA population densityat school numeric Square root Population_school PopS
h_Absorbance_Log h_Absorbance_Log Indoor air Indoor air PM Postnatal Home NA Concentration of absorbance numeric Natural Logarithm PMabs in PMabsIN
h_Benzene_Log h_Benzene_Log Indoor air Indoor air BTEX Postnatal Home NA Concentration of indoor Benzene numeric Natural Logarithm Benzene in Benzene
h_NO2_Log h_NO2_Log Indoor air Indoor air NO2 Postnatal Home NA Concentration of indoor NO2 numeric Natural Logarithm NO2 in NO2IN
h_PM_Log h_PM_Log Indoor air Indoor air PM Postnatal Home NA Concentration of particulate matter numeric Natural Logarithm PM2.5 in PM2.5IN
h_TEX_Log h_TEX_Log Indoor air Indoor air BTEX Postnatal Home NA Concentration of indoor BTEX (sum) numeric Natural Logarithm BTEX in BTEX
e3_alcpreg_yn_None e3_alcpreg_yn_None Lifestyles Lifestyle Prenatal Alcohol Pregnancy NA NA alcohol during pregnancy yes/no (0=none or <1/m for KANC) factor None Alcohol Alcohol
h_bfdur_Ter h_bfdur_Ter Lifestyles Lifestyle Diet Postnatal NA NA Breastfeeding duration (weeks) factor Tertiles Breastfeeding Breastfeeding
h_cereal_preg_Ter h_cereal_preg_Ter Lifestyles Lifestyle Diet Pregnancy NA NA cereal comsumption during pregnancy (times/week) factor Tertiles Cereals Cereals
h_dairy_preg_Ter h_dairy_preg_Ter Lifestyles Lifestyle Diet Pregnancy NA NA dairy comsumption during pregnancy (times/week) factor Tertiles Dairy Dairy
h_fastfood_preg_Ter h_fastfood_preg_Ter Lifestyles Lifestyle Diet Pregnancy NA NA fast food comsumption during pregnancy (times/week) factor Tertiles Fastfood Fastfood
h_fish_preg_Ter h_fish_preg_Ter Lifestyles Lifestyle Diet Pregnancy NA NA fish comsumption during pregnancy (times/week) factor Tertiles Fish Fish
h_folic_t1_None h_folic_t1_None Lifestyles Lifestyle Folic acid consumption Pregnancy NA NA folic acid supplementation during pregnancy factor None Folic acid Folic acid
h_fruit_preg_Ter h_fruit_preg_Ter Lifestyles Lifestyle Diet Pregnancy NA NA fruit comsumption during pregnancy (times/week) factor Tertiles Fruits Fruits
h_legume_preg_Ter h_legume_preg_Ter Lifestyles Lifestyle Diet Pregnancy NA NA legume comsumption during pregnancy (times/week) factor Tertiles Legumes Legumes
h_meat_preg_Ter h_meat_preg_Ter Lifestyles Lifestyle Diet Pregnancy NA NA meat comsumption during pregnancy (times/week) factor Tertiles Meat Meat
h_pamod_t3_None h_pamod_t3_None Lifestyles Lifestyle Physical activity Pregnancy NA NA Walking and/or cycling acitivity during pregnancy (frequency) factor None PAmoderate PAModp
h_pavig_t3_None h_pavig_t3_None Lifestyles Lifestyle Physical activity Pregnancy NA NA Exercise or sport acitivity during pregnancy (frequency) factor None PAvigorous PAVig
h_veg_preg_Ter h_veg_preg_Ter Lifestyles Lifestyle Diet Pregnancy NA NA vegetables comsumption during pregnancy (times/week) factor Tertiles Vegetables Vegetables
hs_bakery_prod_Ter hs_bakery_prod_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: bakery products (hs_cookies + hs_pastries) factor Tertiles Bakery prod BakeProd
hs_beverages_Ter hs_beverages_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: beverages (hs_dietsoda+hs_soda) factor Tertiles Soda Soda
hs_break_cer_Ter hs_break_cer_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: breakfast cereal (hs_sugarcer+hs_othcer) factor Tertiles BF cereals BFcereals
hs_caff_drink_Ter hs_caff_drink_Ter Lifestyles Lifestyle Diet Postnatal NA NA Drinks a caffeinated or æenergy drink (eg coca-cola, diet-coke, redbull) factor Tertiles Caffeine Caffeine
hs_dairy_Ter hs_dairy_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: dairy (hs_cheese + hs_milk + hs_yogurt+ hs_probiotic+ hs_desert) factor Tertiles Dairy Dairy
hs_fastfood_Ter hs_fastfood_Ter Lifestyles Lifestyle Diet Postnatal NA NA Visits a fast food restaurant/take away factor Tertiles Fastfood Fastfood
hs_KIDMED_None hs_KIDMED_None Lifestyles Lifestyle Diet Postnatal NA NA Sum of KIDMED indices, without index9 numeric None KIDMED KIDMED
hs_mvpa_prd_alt_None hs_mvpa_prd_alt_None Lifestyles Lifestyle Physical activity Postnatal NA NA Clean & Over-reporting of Moderate-to-Vigorous Physical Activity (min/day) numeric None PA PA
hs_org_food_Ter hs_org_food_Ter Lifestyles Lifestyle Diet Postnatal NA NA Eats organic food factor Tertiles Organicfood Organicfood
hs_pet_cat_r2_None hs_pet_cat_r2_None Lifestyles Lifestyle Allergens Postnatal NA NA Do you have any cats that live mainly in your home? factor None Cat_home Cat
hs_pet_dog_r2_None hs_pet_dog_r2_None Lifestyles Lifestyle Allergens Postnatal NA NA Do you have any dogs that live mainly in your home? factor None Dog_home Dog
hs_pet_None hs_pet_None Lifestyles Lifestyle Allergens Postnatal NA NA Do you have any other pets that live mainly in your home? factor None Other pets_home Pets
hs_proc_meat_Ter hs_proc_meat_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: processed meat (hs_coldmeat+hs_ham) factor Tertiles Processed meat ProcMeat
hs_readymade_Ter hs_readymade_Ter Lifestyles Lifestyle Diet Postnatal NA NA Eats a æready-made supermarket meal factor Tertiles Ready made food ReadyFood
hs_sd_wk_None hs_sd_wk_None Lifestyles Lifestyle Physical activity Postnatal NA NA sedentary behaviour (min/day) numeric None Sedentary Sedentary
hs_total_bread_Ter hs_total_bread_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: bread (hs_darkbread+hs_whbread) factor Tertiles Bread Bread
hs_total_cereal_Ter hs_total_cereal_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: cereal (hs_darkbread + hs_whbread + hs_rice_pasta + hs_sugarcer + hs_othcer + hs_rusks) factor Tertiles Cereals Cereals
hs_total_fish_Ter hs_total_fish_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: fish and seafood (hs_canfish+hs_oilyfish+hs_whfish+hs_seafood) factor Tertiles Fish Fish
hs_total_fruits_Ter hs_total_fruits_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: fruits (hs_canfruit+hs_dryfruit+hs_freshjuice+hs_fruits) factor Tertiles Fruits Fruits
hs_total_lipids_Ter hs_total_lipids_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: Added fat factor Tertiles Diet fat Diet fat
hs_total_meat_Ter hs_total_meat_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: meat (hs_coldmeat+hs_ham+hs_poultry+hs_redmeat) factor Tertiles Meat Meat
hs_total_potatoes_Ter hs_total_potatoes_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: potatoes (hs_frenchfries+hs_potatoes) factor Tertiles Potatoes Potatoes
hs_total_sweets_Ter hs_total_sweets_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: sweets (hs_choco + hs_sweets + hs_sugar) factor Tertiles Sweets Sweets
hs_total_veg_Ter hs_total_veg_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: vegetables (hs_cookveg+hs_rawveg) factor Tertiles Vegetables Vegetables
hs_total_yog_Ter hs_total_yog_Ter Lifestyles Lifestyle Diet Postnatal NA NA Food group: yogurt (hs_yogurt+hs_probiotic) factor Tertiles Yogurt Yogurt
hs_dif_hours_total_None hs_dif_hours_total_None Lifestyles Lifestyle Sleep Postnatal NA NA Total hours of sleep (mean weekdays and night) numeric None Sleep Sleep
hs_as_c_Log2 hs_as_c_Log2 Chemicals Metals As Postnatal NA NA Arsenic (As) in child numeric Logarithm base 2 As As
hs_as_m_Log2 hs_as_m_Log2 Chemicals Metals As Pregnancy NA NA Arsenic (As) in mother numeric Logarithm base 2 As As
hs_cd_c_Log2 hs_cd_c_Log2 Chemicals Metals Cd Postnatal NA NA Cadmium (Cd) in child numeric Logarithm base 2 Cd Cd
hs_cd_m_Log2 hs_cd_m_Log2 Chemicals Metals Cd Pregnancy NA NA Cadmium (Cd) in mother numeric Logarithm base 2 Cd Cd
hs_co_c_Log2 hs_co_c_Log2 Chemicals Metals Co Postnatal NA NA Cobalt (Co) in child numeric Logarithm base 2 Co Co
hs_co_m_Log2 hs_co_m_Log2 Chemicals Metals Co Pregnancy NA NA Cobalt (Co) in mother numeric Logarithm base 2 Co Co
hs_cs_c_Log2 hs_cs_c_Log2 Chemicals Metals Cs Postnatal NA NA Caesium (Cs) in child numeric Logarithm base 2 Cs Cs
hs_cs_m_Log2 hs_cs_m_Log2 Chemicals Metals Cs Pregnancy NA NA Caesium (Cs) in mother numeric Logarithm base 2 Cs Cs
hs_cu_c_Log2 hs_cu_c_Log2 Chemicals Metals Cu Postnatal NA NA Copper (Cu) in child numeric Logarithm base 2 Cu Cu
hs_cu_m_Log2 hs_cu_m_Log2 Chemicals Metals Cu Pregnancy NA NA Copper (Cu) in mother numeric Logarithm base 2 Cu Cu
hs_hg_c_Log2 hs_hg_c_Log2 Chemicals Metals Hg Postnatal NA NA Mercury (Hg) in child numeric Logarithm base 2 Hg Hg
hs_hg_m_Log2 hs_hg_m_Log2 Chemicals Metals Hg Pregnancy NA NA Mercury (Hg) in mother numeric Logarithm base 2 Hg Hg
hs_mn_c_Log2 hs_mn_c_Log2 Chemicals Metals Mn Postnatal NA NA Manganese (Mn) in child numeric Logarithm base 2 Mn Mn
hs_mn_m_Log2 hs_mn_m_Log2 Chemicals Metals Mn Pregnancy NA NA Manganese (Mn) in mother numeric Logarithm base 2 Mn Mn
hs_mo_c_Log2 hs_mo_c_Log2 Chemicals Metals Mo Postnatal NA NA Molybdenum (Mo) in child numeric Logarithm base 2 Mo Mo
hs_mo_m_Log2 hs_mo_m_Log2 Chemicals Metals Mo Pregnancy NA NA Molybdenum (Mo) in mother numeric Logarithm base 2 Mo Mo
hs_pb_c_Log2 hs_pb_c_Log2 Chemicals Metals Pb Postnatal NA NA Lead (Pb) in child numeric Logarithm base 2 Pb Pb
hs_pb_m_Log2 hs_pb_m_Log2 Chemicals Metals Pb Pregnancy NA NA Lead (Pb) in mother numeric Logarithm base 2 Pb Pb
hs_tl_cdich_None hs_tl_cdich_None Chemicals Metals Tl Postnatal NA NA Dichotomous variable of thallium (Tl) in child factor None Tl Tl
hs_tl_mdich_None hs_tl_mdich_None Chemicals Metals Tl Pregnancy NA NA Dichotomous variable of thallium (Tl) in mother factor None Tl Tl
h_humidity_preg_None h_humidity_preg_None Outdoor exposures Meteorological Humidity Pregnancy Home NA Humidity average during pregnancy numeric None Hum. Hum
h_pressure_preg_None h_pressure_preg_None Outdoor exposures Meteorological Pressure Pregnancy Home NA Pressure average during pregnancy numeric None Pres.  Pres
h_temperature_preg_None h_temperature_preg_None Outdoor exposures Meteorological Temperature Pregnancy Home NA Temperature average during pregnancy numeric None T T
hs_hum_mt_hs_h_None hs_hum_mt_hs_h_None Outdoor exposures Meteorological Humidity Postnatal Home Month before examination Relative humidityone month before at home numeric None Hum.(month) Hum.(month)
hs_tm_mt_hs_h_None hs_tm_mt_hs_h_None Outdoor exposures Meteorological Temperature Postnatal Home Month before examination Mean temperatureone month before at home numeric None T(month) T(month)
hs_uvdvf_mt_hs_h_None hs_uvdvf_mt_hs_h_None Outdoor exposures Meteorological UV Postnatal Home Month before examination Vitamine-D UV dose per subjectone month before at home numeric None UV(month) UV(month)
hs_hum_dy_hs_h_None hs_hum_dy_hs_h_None Outdoor exposures Meteorological Humidity Postnatal Home Day before examination Relative humidityone day before at home numeric None T(day) T(day)
hs_hum_wk_hs_h_None hs_hum_wk_hs_h_None Outdoor exposures Meteorological Humidity Postnatal Home Week before examination Relative humidityone week before at home numeric None Hum.(week) Hum.(week)
hs_tm_dy_hs_h_None hs_tm_dy_hs_h_None Outdoor exposures Meteorological Temperature Postnatal Home Day before examination Mean temperatureone day before at home numeric None T(day) T(day)
hs_tm_wk_hs_h_None hs_tm_wk_hs_h_None Outdoor exposures Meteorological Temperature Postnatal Home Week before examination Mean temperatureone week before at home numeric None T(week) T(week)
hs_uvdvf_dy_hs_h_None hs_uvdvf_dy_hs_h_None Outdoor exposures Meteorological UV Postnatal Home Day before examination Vitamin-D UV dose per subjectone day before at home numeric None UV(day) UV(day)
hs_uvdvf_wk_hs_h_None hs_uvdvf_wk_hs_h_None Outdoor exposures Meteorological UV Postnatal Home Week before examination Vitamin-D UV dose per subjectone week before at home numeric None UV(week) UV(week)
hs_blueyn300_s_None hs_blueyn300_s_None Outdoor exposures Natural Spaces Blue Postnatal School NA Is there a bluespace in a distance of 300m?at school factor None Blue_school BlueS
h_blueyn300_preg_None h_blueyn300_preg_None Outdoor exposures Natural Spaces Blue Pregnancy Home NA Is there a bluespace in a distance of 300m?at pregnancy period factor None Blue space Blue
h_greenyn300_preg_None h_greenyn300_preg_None Outdoor exposures Natural Spaces Green Pregnancy Home NA Is there a greenspace in a distance of 300m?at pregnancy period factor None Green space Green
h_ndvi100_preg_None h_ndvi100_preg_None Outdoor exposures Natural Spaces NDVI Pregnancy Home NA Average of NDVI values within a buffer of 100mat pregnancy period numeric None NDVI NDVI
hs_greenyn300_s_None hs_greenyn300_s_None Outdoor exposures Natural Spaces Green Postnatal School NA Is there a greenspace in a distance of 300m?at school factor None Green_school GreenS
hs_blueyn300_h_None hs_blueyn300_h_None Outdoor exposures Natural Spaces Blue Postnatal Home NA Is there a bluespace in a distance of 300m?at home factor None Blue_home BlueH
hs_greenyn300_h_None hs_greenyn300_h_None Outdoor exposures Natural Spaces Green Postnatal Home NA Is there a greenspace in a distance of 300m?at home factor None Green_home GreenH
hs_ndvi100_h_None hs_ndvi100_h_None Outdoor exposures Natural Spaces NDVI Postnatal Home NA Average of NDVI values within a buffer of 100mat home numeric None NDVI_home NDVIH
hs_ndvi100_s_None hs_ndvi100_s_None Outdoor exposures Natural Spaces NDVI Postnatal School NA Average of NDVI values within a buffer of 100m at school numeric None NDVI_school NDVIS
h_lden_cat_preg_None h_lden_cat_preg_None Outdoor exposures Noise Noise Pregnancy Home NA Categorized lden (day, evening, night)at pregnancy period numeric None Traffic noise_24h Noise
hs_ln_cat_h_None hs_ln_cat_h_None Outdoor exposures Noise Noise Postnatal Home NA Categorized ln (night)at home factor None Traffic noise_night NoiseNight
hs_lden_cat_s_None hs_lden_cat_s_None Outdoor exposures Noise Noise Postnatal School NA Categorized lden (one day, evening, night)at school factor None Traffic noise_24h school NoiseS
hs_dde_cadj_Log2 hs_dde_cadj_Log2 Chemicals Organochlorines DDE Postnatal NA NA Dichlorodiphenyldichloroethylene (DDE) in child adjusted for lipids numeric Logarithm base 2 DDE DDE
hs_dde_madj_Log2 hs_dde_madj_Log2 Chemicals Organochlorines DDE Pregnancy NA NA Dichlorodiphenyldichloroethylene (DDE) in mother adjusted for lipids numeric Logarithm base 2 DDE DDE
hs_ddt_cadj_Log2 hs_ddt_cadj_Log2 Chemicals Organochlorines DDT Postnatal NA NA Dichlorodiphenyltrichloroethane (DDT) in child adjusted for lipids numeric Logarithm base 2 DDT DDT
hs_ddt_madj_Log2 hs_ddt_madj_Log2 Chemicals Organochlorines DDT Pregnancy NA NA Dichlorodiphenyltrichloroethane (DDT) in mother adjusted for lipids numeric Logarithm base 2 DDT DDT
hs_hcb_cadj_Log2 hs_hcb_cadj_Log2 Chemicals Organochlorines HCB Postnatal NA NA Hexachlorobenzene (HCB) in child adjusted for lipids numeric Logarithm base 2 HCB HCB
hs_hcb_madj_Log2 hs_hcb_madj_Log2 Chemicals Organochlorines HCB Pregnancy NA NA Hexachlorobenzene (HCB) in mother adjusted for lipids numeric Logarithm base 2 HCB HCB
hs_pcb118_cadj_Log2 hs_pcb118_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Polychlorinated biphenyl -118 (PCB-118) in child adjusted for lipids numeric Logarithm base 2 PCB 118 PCB118
hs_pcb118_madj_Log2 hs_pcb118_madj_Log2 Chemicals Organochlorines PCBs Pregnancy NA NA Polychlorinated biphenyl-118 (PCB-118) in mother adjusted for lipids numeric Logarithm base 2 PCB 118 PCB118
hs_pcb138_cadj_Log2 hs_pcb138_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Polychlorinated biphenyl-138 (PCB-138) in child adjusted for lipids numeric Logarithm base 2 PCB 138 PCB138
hs_pcb138_madj_Log2 hs_pcb138_madj_Log2 Chemicals Organochlorines PCBs Pregnancy NA NA Polychlorinated biphenyl-138 (PCB-138) in mother adjusted for lipids numeric Logarithm base 2 PCB 138 PCB138
hs_pcb153_cadj_Log2 hs_pcb153_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Polychlorinated biphenyl-153 (PCB-153) in child adjusted for lipids numeric Logarithm base 2 PCB 153 PCB153
hs_pcb153_madj_Log2 hs_pcb153_madj_Log2 Chemicals Organochlorines PCBs Pregnancy NA NA Polychlorinated biphenyl-153 (PCB-153) in mother adjusted for lipids numeric Logarithm base 2 PCB 153 PCB153
hs_pcb170_cadj_Log2 hs_pcb170_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Polychlorinated biphenyl-170 (PCB-170) in child adjusted for lipids numeric Logarithm base 2 PCB 170 PCB170
hs_pcb170_madj_Log2 hs_pcb170_madj_Log2 Chemicals Organochlorines PCBs Pregnancy NA NA Polychlorinated biphenyl-170 (PCB-170) in mother adjusted for lipids numeric Logarithm base 2 PCB 170 PCB170
hs_pcb180_cadj_Log2 hs_pcb180_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Polychlorinated biphenyl-180 (PCB-180) in child adjusted for lipids numeric Logarithm base 2 PCB 180 PCB180
hs_pcb180_madj_Log2 hs_pcb180_madj_Log2 Chemicals Organochlorines PCBs Pregnancy NA NA Polychlorinated biphenyl-180 (PCB-180) in mother adjusted for lipids numeric Logarithm base 2 PCB 180 PCB180
hs_sumPCBs5_cadj_Log2 hs_sumPCBs5_cadj_Log2 Chemicals Organochlorines PCBs Postnatal NA NA Sum of PCBs in child adjusted for lipids (4 cohorts) numeric Logarithm base 2 PCBs SumPCB
hs_sumPCBs5_madj_Log2 hs_sumPCBs5_madj_Log2 Chemicals Organochlorines PCBs Pregnancy NA NA Sum of PCBs in mother adjusted for lipids (5 cohorts) numeric Logarithm base 2 PCBs SumPCB
hs_dep_cadj_Log2 hs_dep_cadj_Log2 Chemicals Organophosphate pesticides DEP Postnatal NA NA Diethyl phosphate (DEP) in child adjusted for creatinine numeric Logarithm base 2 DEP DEP
hs_dep_madj_Log2 hs_dep_madj_Log2 Chemicals Organophosphate pesticides DEP Pregnancy NA NA Diethyl phosphate (DEP) in mother adjusted for creatinine numeric Logarithm base 2 DEP DEP
hs_detp_cadj_Log2 hs_detp_cadj_Log2 Chemicals Organophosphate pesticides DETP Postnatal NA NA Diethyl thiophosphate (DETP) in child adjusted for creatinine numeric Logarithm base 2 DETP DETP
hs_detp_madj_Log2 hs_detp_madj_Log2 Chemicals Organophosphate pesticides DETP Pregnancy NA NA Diethyl thiophosphate (DETP) in mother adjusted for creatinine numeric Logarithm base 2 DETP DETP
hs_dmdtp_cdich_None hs_dmdtp_cdich_None Chemicals Organophosphate pesticides DMDTP Postnatal NA NA Dichotomous variable of dimethyl dithiophosphate (DMDTP) in child factor None DMDTP DMDTP
hs_dmp_cadj_Log2 hs_dmp_cadj_Log2 Chemicals Organophosphate pesticides DMP Postnatal NA NA Dimethyl phosphate (DMP) in child adjusted for creatinine numeric Logarithm base 2 DMP DMP
hs_dmp_madj_Log2 hs_dmp_madj_Log2 Chemicals Organophosphate pesticides DMP Pregnancy NA NA Dimethyl phosphate (DMP) in mother adjusted for creatinine numeric Logarithm base 2 DMP DMP
hs_dmtp_cadj_Log2 hs_dmtp_cadj_Log2 Chemicals Organophosphate pesticides DMTP Postnatal NA NA Dimethyl thiophosphate (DMTP) in child adjusted for creatinine numeric Logarithm base 2 DMDTP DMTP
hs_dmtp_madj_Log2 hs_dmtp_madj_Log2 Chemicals Organophosphate pesticides DMTP Pregnancy NA NA Dimethyl thiophosphate (DMTP) in child adjusted for creatinine numeric Logarithm base 2 DMDTP DMTP
hs_pbde153_cadj_Log2 hs_pbde153_cadj_Log2 Chemicals Polybrominated diphenyl ethers (PBDE) PBDE153 Postnatal NA NA Polybrominated diphenyl ether-153 (PBDE-153) in child adjusted for lipids numeric Logarithm base 2 PBDE 153 PBDE153
hs_pbde153_madj_Log2 hs_pbde153_madj_Log2 Chemicals Polybrominated diphenyl ethers (PBDE) PBDE153 Pregnancy NA NA Polybrominated diphenyl ether-153 (PBDE-153) in mother adjusted for lipids numeric Logarithm base 2 PBDE 153 PBDE153
hs_pbde47_cadj_Log2 hs_pbde47_cadj_Log2 Chemicals Polybrominated diphenyl ethers (PBDE) PBDE47 Postnatal NA NA Polybrominated diphenyl ether-47 (PBDE-47) in child adjusted for lipids numeric Logarithm base 2 PBDE 47 PBDE47
hs_pbde47_madj_Log2 hs_pbde47_madj_Log2 Chemicals Polybrominated diphenyl ethers (PBDE) PBDE47 Pregnancy NA NA Polybrominated diphenyl ether-47 (PBDE-47) in mother adjusted for lipids numeric Logarithm base 2 PBDE 47 PBDE47
hs_pfhxs_c_Log2 hs_pfhxs_c_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFHXS Postnatal NA NA Perfluorohexane sulfonate (PFHXS) in child numeric Logarithm base 2 PFHXS PFHXS
hs_pfhxs_m_Log2 hs_pfhxs_m_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFHXS Pregnancy NA NA Perfluorohexane sulfonate (PFHXS) in mother numeric Logarithm base 2 PFHXS PFHXS
hs_pfna_c_Log2 hs_pfna_c_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFNA Postnatal NA NA Perfluorononanoate (PFNA) in child numeric Logarithm base 2 PFNA PFNA
hs_pfna_m_Log2 hs_pfna_m_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFNA Pregnancy NA NA Perfluorononanoate (PFNA) in mother numeric Logarithm base 2 PFNA PFNA
hs_pfoa_c_Log2 hs_pfoa_c_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFOA Postnatal NA NA Perfluorooctanoate (PFOA) in child numeric Logarithm base 2 PFOA PFOA
hs_pfoa_m_Log2 hs_pfoa_m_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFOA Pregnancy NA NA Perfluorooctanoate (PFOA) in mother numeric Logarithm base 2 PFOA PFOA
hs_pfos_c_Log2 hs_pfos_c_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFOS Postnatal NA NA Perfluorooctane sulfonate (PFOS) in child numeric Logarithm base 2 PFOS PFOS
hs_pfos_m_Log2 hs_pfos_m_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFOS Pregnancy NA NA Perfluorooctane sulfonate (PFOS) in mother numeric Logarithm base 2 PFOS PFOS
hs_pfunda_c_Log2 hs_pfunda_c_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFUNDA Postnatal NA NA Perfluoroundecanoate (PFUNDA) in child numeric Logarithm base 2 PFUNDA PFUNDA
hs_pfunda_m_Log2 hs_pfunda_m_Log2 Chemicals Per- and polyfluoroalkyl substances (PFAS) PFUNDA Pregnancy NA NA Perfluoroundecanoate (PFUNDA) in mother numeric Logarithm base 2 PFUNDA PFUNDA
hs_bpa_cadj_Log2 hs_bpa_cadj_Log2 Chemicals Phenols BPA Postnatal NA NA Bisphenol A (BPA) in child adjusted for creatinine numeric Logarithm base 2 BPA BPA
hs_bpa_madj_Log2 hs_bpa_madj_Log2 Chemicals Phenols BPA Pregnancy NA NA Bisphenol A (BPA) in mother adjusted for creatinine numeric Logarithm base 2 BPA BPA
hs_bupa_cadj_Log2 hs_bupa_cadj_Log2 Chemicals Phenols BUPA Postnatal NA NA N-Butyl paraben (BUPA) in child adjusted for creatinine numeric Logarithm base 2 BUPA BUPA
hs_bupa_madj_Log2 hs_bupa_madj_Log2 Chemicals Phenols BUPA Pregnancy NA NA N-Butyl paraben (BUPA) in mother adjusted for creatinine numeric Logarithm base 2 BUPA BUPA
hs_etpa_cadj_Log2 hs_etpa_cadj_Log2 Chemicals Phenols ETPA Postnatal NA NA Ethyl paraben (ETPA) in child adjusted for creatinine numeric Logarithm base 2 ETPA ETPA
hs_etpa_madj_Log2 hs_etpa_madj_Log2 Chemicals Phenols ETPA Pregnancy NA NA Ethyl paraben (ETPA) in mother adjusted for creatinine numeric Logarithm base 2 ETPA ETPA
hs_mepa_cadj_Log2 hs_mepa_cadj_Log2 Chemicals Phenols MEPA Postnatal NA NA Methyl paraben (MEPA) in child adjusted for creatinine numeric Logarithm base 2 MEPA MEPA
hs_mepa_madj_Log2 hs_mepa_madj_Log2 Chemicals Phenols MEPA Pregnancy NA NA Methyl paraben (MEPA) in mother adjusted for creatinine numeric Logarithm base 2 MEPA MEPA
hs_oxbe_cadj_Log2 hs_oxbe_cadj_Log2 Chemicals Phenols OXBE Postnatal NA NA Oxybenzone (OXBE) in child adjusted for creatinine numeric Logarithm base 2 OXBE OXBE
hs_oxbe_madj_Log2 hs_oxbe_madj_Log2 Chemicals Phenols OXBE Pregnancy NA NA Oxybenzone (OXBE) in mother adjusted for creatinine numeric Logarithm base 2 OXBE OXBE
hs_prpa_cadj_Log2 hs_prpa_cadj_Log2 Chemicals Phenols PRPA Postnatal NA NA Propyl paraben (PRPA) in child adjusted for creatinine numeric Logarithm base 2 PRPA PRPA
hs_prpa_madj_Log2 hs_prpa_madj_Log2 Chemicals Phenols PRPA Pregnancy NA NA Propyl paraben (PRPA) in mother adjusted for creatinine numeric Logarithm base 2 PRPA PRPA
hs_trcs_cadj_Log2 hs_trcs_cadj_Log2 Chemicals Phenols TRCS Postnatal NA NA Triclosan (TRCS) in child adjusted for creatinine numeric Logarithm base 2 TRCS TRCS
hs_trcs_madj_Log2 hs_trcs_madj_Log2 Chemicals Phenols TRCS Pregnancy NA NA Triclosan (TRCS) in mother adjusted for creatinine numeric Logarithm base 2 TRCS TRCS
hs_mbzp_cadj_Log2 hs_mbzp_cadj_Log2 Chemicals Phthalates MBZP Postnatal NA NA Mono benzyl phthalate (MBzP) in child adjusted for creatinine numeric Logarithm base 2 MBZP MBZP
hs_mbzp_madj_Log2 hs_mbzp_madj_Log2 Chemicals Phthalates MBZP Pregnancy NA NA Mono benzyl phthalate (MBzP) in mother adjusted for creatinine numeric Logarithm base 2 MBZP MBZP
hs_mecpp_cadj_Log2 hs_mecpp_cadj_Log2 Chemicals Phthalates MECPP Postnatal NA NA Mono-2-ethyl 5-carboxypentyl phthalate (MECPP) in child adjusted for creatinine numeric Logarithm base 2 MECPP MECPP
hs_mecpp_madj_Log2 hs_mecpp_madj_Log2 Chemicals Phthalates MECPP Pregnancy NA NA Mono-2-ethyl 5-carboxypentyl phthalate (MECPP) in mother adjusted for creatinine numeric Logarithm base 2 MECPP MECPP
hs_mehhp_cadj_Log2 hs_mehhp_cadj_Log2 Chemicals Phthalates MEHHP Postnatal NA NA Mono-2-ethyl-5-hydroxyhexyl phthalate (MEHHP) in child adjusted for creatinine numeric Logarithm base 2 MEHHP MEHHP
hs_mehhp_madj_Log2 hs_mehhp_madj_Log2 Chemicals Phthalates MEHHP Pregnancy NA NA Mono-2-ethyl-5-hydroxyhexyl phthalate (MEHHP) in mother adjusted for creatinine numeric Logarithm base 2 MEHHP MEHHP
hs_mehp_cadj_Log2 hs_mehp_cadj_Log2 Chemicals Phthalates MEHP Postnatal NA NA Mono-2-ethylhexyl phthalate (MEHP) in child adjusted for creatinine numeric Logarithm base 2 MEHP MEHP
hs_mehp_madj_Log2 hs_mehp_madj_Log2 Chemicals Phthalates MEHP Pregnancy NA NA Mono-2-ethylhexyl phthalate (MEHP) in mother adjusted for creatinine numeric Logarithm base 2 MEHP MEHP
hs_meohp_cadj_Log2 hs_meohp_cadj_Log2 Chemicals Phthalates MEOHP Postnatal NA NA Mono-2-ethyl-5-oxohexyl phthalate (MEOHP) in child adjusted for creatinine numeric Logarithm base 2 MEOHP MEOHP
hs_meohp_madj_Log2 hs_meohp_madj_Log2 Chemicals Phthalates MEOHP Pregnancy NA NA Mono-2-ethyl-5-oxohexyl phthalate (MEOHP) in mother adjusted for creatinine numeric Logarithm base 2 MEOHP MEOHP
hs_mep_cadj_Log2 hs_mep_cadj_Log2 Chemicals Phthalates MEP Postnatal NA NA Monoethyl phthalate (MEP) in child adjusted for creatinine numeric Logarithm base 2 MEP MEP
hs_mep_madj_Log2 hs_mep_madj_Log2 Chemicals Phthalates MEP Pregnancy NA NA Monoethyl phthalate (MEP) in mother adjusted for creatinine numeric Logarithm base 2 MEP MEP
hs_mibp_cadj_Log2 hs_mibp_cadj_Log2 Chemicals Phthalates MIBP Postnatal NA NA Mono-iso-butyl phthalate (MiBP) in child adjusted for creatinine numeric Logarithm base 2 MIBP MIBP
hs_mibp_madj_Log2 hs_mibp_madj_Log2 Chemicals Phthalates MIBP Pregnancy NA NA Mono-iso-butyl phthalate (MiBP) in mother adjusted for creatinine numeric Logarithm base 2 MIBP MIBP
hs_mnbp_cadj_Log2 hs_mnbp_cadj_Log2 Chemicals Phthalates MNBP Postnatal NA NA Mono-n-butyl phthalate (MnBP) in child adjusted for creatinine numeric Logarithm base 2 MNBP MNBP
hs_mnbp_madj_Log2 hs_mnbp_madj_Log2 Chemicals Phthalates MNBP Pregnancy NA NA Mono-n-butyl phthalate (MnBP) in mother adjusted for creatinine numeric Logarithm base 2 MNBP MNBP
hs_ohminp_cadj_Log2 hs_ohminp_cadj_Log2 Chemicals Phthalates OHMiNP Postnatal NA NA Mono-4-methyl-7-hydroxyoctyl phthalate (OHMiNP) in child adjusted for creatinine numeric Logarithm base 2 OHMiNP OHMiNP
hs_ohminp_madj_Log2 hs_ohminp_madj_Log2 Chemicals Phthalates OHMiNP Pregnancy NA NA Mono-4-methyl-7-hydroxyoctyl phthalate (OHMiNP) in mother adjusted for creatinine numeric Logarithm base 2 OHMiNP OHMiNP
hs_oxominp_cadj_Log2 hs_oxominp_cadj_Log2 Chemicals Phthalates OXOMINP Postnatal NA NA Mono-4-methyl-7-oxooctyl phthalate (OXOMiNP) in child adjusted for creatinine numeric Logarithm base 2 OXOMINP OXOMINP
hs_oxominp_madj_Log2 hs_oxominp_madj_Log2 Chemicals Phthalates OXOMINP Pregnancy NA NA Mono-4-methyl-7-oxooctyl phthalate (OXOMiNP) in mother adjusted for creatinine numeric Logarithm base 2 OXOMINP OXOMINP
hs_sumDEHP_cadj_Log2 hs_sumDEHP_cadj_Log2 Chemicals Phthalates DEHP Postnatal NA NA Sum of DEHP metabolites (µg/g) in child adjusted for creatinine numeric Logarithm base 2 DEHP SumDEHP
hs_sumDEHP_madj_Log2 hs_sumDEHP_madj_Log2 Chemicals Phthalates DEHP Pregnancy NA NA Sum of DEHP metabolites (µg/g) in mother adjusted for creatinine numeric Logarithm base 2 DEHP SumDEHP
FAS_cat_None FAS_cat_None Chemicals Social and economic capital Economic capital Postnatal NA NA Family affluence score factor None Family affluence FamAfl
hs_contactfam_3cat_num_None hs_contactfam_3cat_num_None Chemicals Social and economic capital Social capital Postnatal NA NA scoial capital: family friends factor None Social contact SocCont
hs_hm_pers_None hs_hm_pers_None Chemicals Social and economic capital Social capital Postnatal NA NA How many people live in your home? numeric None House crowding HouseCrow
hs_participation_3cat_None hs_participation_3cat_None Chemicals Social and economic capital Social capital Postnatal NA NA social capital: structural factor None Social participation SocPartic
e3_asmokcigd_p_None e3_asmokcigd_p_None Chemicals Tobacco Smoke Tobacco Smoke Pregnancy NA NA maternal active Tobacco Smoke pregnancy mean nb cig/day numeric None Cigarette Cigarette
hs_cotinine_cdich_None hs_cotinine_cdich_None Chemicals Tobacco Smoke Cotinine Postnatal NA NA Dichotomous variable of cotinine in child factor None Cotinine Cotinine
hs_cotinine_mcat_None hs_cotinine_mcat_None Chemicals Tobacco Smoke Cotinine Pregnancy NA NA Categorical variable of cotinine in mother factor None Cotinine Cotinine
hs_globalexp2_None hs_globalexp2_None Chemicals Tobacco Smoke Tobacco Smoke Postnatal NA NA Global exposure of the child to ETS (2 categories) factor None ETS ETS
hs_smk_parents_None hs_smk_parents_None Chemicals Tobacco Smoke Tobacco Smoke Postnatal NA NA Tobacco Smoke status of parents (both) factor None Smoking_parents SmokPar
h_distinvnear1_preg_Log h_distinvnear1_preg_Log Outdoor exposures Traffic Traffic Pregnancy Home NA Inverse distance to nearest road at pregnancy period numeric Natural Logarithm Distance road DistRoad
h_trafload_preg_pow1over3 h_trafload_preg_pow1over3 Outdoor exposures Traffic Traffic Pregnancy Home NA Total traffic load of all roads in 100 m buffer at pregnancy period numeric None Traffic_100m Traffic
h_trafnear_preg_pow1over3 h_trafnear_preg_pow1over3 Outdoor exposures Traffic Traffic Pregnancy Home NA Traffic density on nearest road at pregnancy period numeric None Traffic density TrafDens
hs_trafload_h_pow1over3 hs_trafload_h_pow1over3 Outdoor exposures Traffic Traffic Postnatal Home NA Total traffic load of all roads in 100 m buffer at home numeric None Trafficload - nearest TrafNeares
hs_trafnear_h_pow1over3 hs_trafnear_h_pow1over3 Outdoor exposures Traffic Traffic Postnatal Home NA Traffic density on nearest road at home numeric None Traffic near DistRoadH
h_bro_preg_Log h_bro_preg_Log Outdoor exposures Water DBPs Water DBPs Pregnancy Home NA Total concentration of Brominated during pregnancy numeric Natural Logarithm Brom_THMs Brom
h_clf_preg_Log h_clf_preg_Log Outdoor exposures Water DBPs Water DBPs Pregnancy Home NA Total concentration of chloroform during pregnancy numeric Natural Logarithm Chloroform Chloroform
h_thm_preg_Log h_thm_preg_Log Outdoor exposures Water DBPs Water DBPs Pregnancy Home NA Total concentration of trihalomethanes during pregnancy numeric Natural Logarithm THMs THMs
h_mbmi_None h_mbmi_None Covariates Covariates Maternal covariate Pregnancy NA NA Maternal pre-pregnancy body mass index (kg/m2) numeric None Maternal BMI mBMI
hs_c_height_None hs_c_height_None Covariates Covariates Child covariate Postnatal NA NA Height of the child at 6-11 years old (m) numeric None Child height cHeight
hs_c_weight_None hs_c_weight_None Covariates Covariates Child covariate Postnatal NA NA Weight of the child at 6-11 years old (kg) numeric None Child weight cWeight
hs_wgtgain_None hs_wgtgain_None Covariates Covariates Maternal covariate Pregnancy NA NA Maternal weight gain during pregnancy (kg) numeric None Weight gain Preg Weightgain
e3_gac_None e3_gac_None Covariates Covariates Child covariate Pregnancy NA NA Gestational age at birth (week) numeric None Gestational age at birth GestAge
e3_sex_None e3_sex_None Covariates Covariates Child covariate Pregnancy NA NA Child sex (female / male) factor None Child sex Sex
e3_yearbir_None e3_yearbir_None Covariates Covariates Child covariate Pregnancy NA NA Year of birth (2003 to 2009) factor None Year of birth YearBirth
h_age_None h_age_None Covariates Covariates Maternal covariate Pregnancy NA NA Maternal age (years) numeric None Maternal age mAge
h_cohort h_cohort Covariates Covariates Maternal covariate Pregnancy NA NA Cohort of inclusion (1 to 6) factor None Cohort Cohort
h_edumc_None h_edumc_None Covariates Covariates Maternal covariate Pregnancy NA NA Maternal education (1: primary school, 2:secondary school, 3:university degree or higher) factor None Maternal education mEducation
h_native_None h_native_None Covariates Covariates Child covariate Pregnancy NA NA Are the parents native from the country of the cohort (0: no native parent, 1:only one native parent, 2: both parents native) factor None Native Native
h_parity_None h_parity_None Covariates Covariates Maternal covariate Pregnancy NA NA Parity before index pregnancy (0: nulliparous, 1:primiparous, 2:multiparous) factor None Parity Parity
hs_child_age_None hs_child_age_None Covariates Covariates Child covariate Postnatal NA NA Child age at examination (years) numeric None Child age cAge
e3_bw e3_bw Phenotype Phenotype Outcome at birth Pregnancy NA NA Child weight at birth (g) numeric None Birthweight BW
hs_asthma hs_asthma Phenotype Phenotype Outcome at 6-11 years old Postnatal NA NA Doctor diagnosed asthma (ever) factor None Asthma Asthma
hs_zbmi_who hs_zbmi_who Phenotype Phenotype Outcome at 6-11 years old Postnatal NA NA Body mass index z-score at 6-11 years old - WHO reference - Standardized on sex and age numeric None Body mass index z-score zBMI
hs_correct_raven hs_correct_raven Phenotype Phenotype Outcome at 6-11 years old Postnatal NA NA Intelligence quotient at 6-11 years old - Total of correct answers at the RAVEN test numeric None Intelligence quotient IQ
hs_Gen_Tot hs_Gen_Tot Phenotype Phenotype Outcome at 6-11 years old Postnatal NA NA Neuro behavior - Internalizing and externalizing problems at 6-11 years old - CBCL scale numeric None Behavior Behavior
hs_bmi_c_cat hs_bmi_c_cat Phenotype Phenotype Outcome at 6-11 years old Postnatal NA NA Body mass index categories at 6-11 years old - WHO reference (1: Thinness, 2: Normal, 3:Overweight, 4: Obese) factor None Body mass index (cat) BMI_cat

Available Data Across Individuals

upsetSamples(helix_ma, nintersects = 10)


3. An Example of a Single-layer Omic Analysis for Exposome Data and a Outcome

codebook <- read.table(paste0(work.dir, "/Data/codebook.txt"), sep="\t", header=T)

# Outcome
outcome.Name <- "hs_bmi_c_cat" # "hs_asthma" # "hs_bmi_c_cat" "hs_zbmi_who" "e3_bw" 

# Covariates
covariate.Names <- c("h_mbmi_None","e3_sex_None","h_age_None","h_cohort","h_edumc_None")

# Exposure related
exposure.group <- "Organochlorines" # {"Metals", "Organochlorines", "Organophosphate pesticides", "PBDE", "PFAS", "Phenols", "Phthalates", "All"}

if(exposure.group=="All") { exposure.Names <- as.character(codebook$variable_name[codebook$domain=="Chemicals"]) }
if(exposure.group!="All") { exposure.Names <- as.character(codebook$variable_name[codebook$family==exposure.group]) }
exposure.Names <- exposure.Names[grep("madj", exposure.Names)] # select only mother measures of exposure

# Analysis models to run
univariate <- T
ridge <- T
lasso <- T
elasticnet <- T
bayesian.selection <- T

Exposome: Overview

The idea of the exposome was first discussed by Chris Wild in 2005 (1) with the idea of using omic technologies to capture environmental factors influencing human health and disease. The idea is that if factors in the environment do impact our heath there should be molecular signatures that reflect this and that these signatures, in combination with understanding of environmental drivers (e.g. changes is air pollution), can be used to measure both the external reflection of those exposure within the individual and the internal consequence of those exposures. Rappaport and Smith (2) nicely described this motivation, as if “…toxic effects are mediated through chemicals that alter critical molecules, cells, and physiological processes inside the body…, exposures are not restricted to chemicals (toxicants) entering the body from air, water, or food, for example, but also include chemicals produced by inflammation, oxidative stress, lipid peroxidation, infections, gut flora, and other natural processes”. Such chemicals can be measured with modern “metabolomic” techniques and include both targeted and untargeted approaches. The challenge for the resulting analysis is often how to identify the independent associations of each measured and often correlated exposure feature to an outcome of interest, especially in high dimensions.

This lab section provides examples of descriptive statistics to explore the data and implementation of ridge, lasso, elastic net and Bayesian selection. As the exposure features measured often are assumed to indicate long-term effects of the environment preceding the outcome and other omic measures, the analysis is often extended to a mediation type framework.

Figure from (3).

References:

  1. Wild, C.P. (2005). Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev 14, 1847-1850.

  2. Rappaport, S.M., and Smith, M.T. (2010). Epidemiology. Environment and disease risks. Science 330, 460-461.

  3. Vermeulen, R., Schymanski, E.L., Barabasi, A.L., and Miller, G.W. (2020). The exposome and health: Where chemistry meets biology. Science 367, 392-396.

The Question of interest:

  • How are measured exposures for Organochlorines associated with the outcome hs_bmi_c_cat?

Exposure mixture analysis

Often in assessing multiple exposures we have several questions or goals interest:
1) what is the independent effect of each exposure? 2) do combinations of exposures act in a synergistic manner to increase risk? and, 3) what is the combined effect when an individual is exposed to a mixture of compounds?

The first goal is often explored via multivariable regression and we provide some example of this analysis below. The second goal can be explored with interaction analyses (covered within this workshop). The third goal often relies on mixture approaches. These approaches are not the focus of this particular workhop, but the SHARP training program does offer a workshop in this area. See https://www.publichealth.columbia.edu/research/precision-prevention/environmental-mixtures-workshop-applications-environmental-health-studies

Exposome: Processing the Data

load(paste0(work.dir, "/Data/HELIX.MultiAssayExperiment.RData"))

variables <- c(covariate.Names, exposure.Names)
d <- wideFormat(intersectColumns(helix_ma[variables, ,]), colDataCols=outcome.Name) # 1) select variables but keep in MultiAssayExperiment format; 2) intersectionColumns selects only individuals with complete data; 3) wideFormat returns as a DataFrame
## harmonizing input:
##   removing 11220 sampleMap rows not in names(experiments)
# Create exposure design matrix
X <- as.data.frame(apply(d[,paste("exposome",exposure.Names,sep="_")],2,as.numeric))
names(X) <- exposure.Names
X <- scale(X, center=T, scale=T)

# Create the outcome variable
Y <- d[,outcome.Name] # outcome
if(outcome.Name=="hs_bmi_c_cat") { Y <- ifelse(as.numeric(Y)>=3, 1, 0)}
if(outcome.Name=="e3_bw") { Y <- ifelse(as.numeric(Y)<2500, 1, 0)}

# Create the covariate design matrix
U <- as.data.frame(d[,paste("covariates",covariate.Names,sep="_")])
names(U) <- covariate.Names
U[,c("h_cohort","e3_sex_None","h_edumc_None")] <- lapply(U[,c("h_cohort","e3_sex_None","h_edumc_None")], factor)
U[,c("h_mbmi_None", "h_age_None")] <- lapply(U[,c("h_mbmi_None", "h_age_None")], as.numeric)
U <- model.matrix(as.formula(paste("~-1+", paste(covariate.Names, collapse="+"))), data=U)

# Other variables for analysis
N <- nrow(d) # number of individuals in the analysis
Q <- ncol(U)  # number of covariates in the matrix U
P <- ncol(X)  # number of exposures in the matrix X


Exposome: Descriptive Statistics for Organochlorines:

  • Exposures include a total of 9 exposures and include the following exposures:
    hs_dde_madj_Log2, hs_ddt_madj_Log2, hs_hcb_madj_Log2, hs_pcb118_madj_Log2, hs_pcb138_madj_Log2, hs_pcb153_madj_Log2, hs_pcb170_madj_Log2, hs_pcb180_madj_Log2, hs_sumPCBs5_madj_Log2

Summary Table for Each Exposure

summarytools::view(dfSummary(as.data.frame(X), style = 'grid',
                               max.distinct.values = 10, plain.ascii =   FALSE, valid.col = FALSE, headings = FALSE), method = "render")
No Variable Stats / Values Freqs (% of Valid) Graph Missing
1 hs_dde_madj_Log2 [numeric]
Mean (sd) : 0 (1)
min ≤ med ≤ max:
-2.6 ≤ -0.1 ≤ 2.6
IQR (CV) : 1.3 (-6.151817e+15)
736 distinct values 0 (0.0%)
2 hs_ddt_madj_Log2 [numeric]
Mean (sd) : 0 (1)
min ≤ med ≤ max:
-5.9 ≤ -0.1 ≤ 2.2
IQR (CV) : 0.7 (2.030655e+17)
605 distinct values 0 (0.0%)
3 hs_hcb_madj_Log2 [numeric]
Mean (sd) : 0 (1)
min ≤ med ≤ max:
-9.4 ≤ -0.1 ≤ 3.4
IQR (CV) : 1 (-7.467132e+15)
739 distinct values 0 (0.0%)
4 hs_pcb118_madj_Log2 [numeric]
Mean (sd) : 0 (1)
min ≤ med ≤ max:
-2.5 ≤ -0.2 ≤ 6.6
IQR (CV) : 1.2 (-1.862764e+16)
580 distinct values 0 (0.0%)
5 hs_pcb138_madj_Log2 [numeric]
Mean (sd) : 0 (1)
min ≤ med ≤ max:
-9.8 ≤ 0 ≤ 4.1
IQR (CV) : 1.4 (1.627889e+16)
743 distinct values 0 (0.0%)
6 hs_pcb153_madj_Log2 [numeric]
Mean (sd) : 0 (1)
min ≤ med ≤ max:
-2.3 ≤ 0 ≤ 5.2
IQR (CV) : 1.6 (9.241889e+15)
745 distinct values 0 (0.0%)
7 hs_pcb170_madj_Log2 [numeric]
Mean (sd) : 0 (1)
min ≤ med ≤ max:
-1.8 ≤ -0.1 ≤ 4.1
IQR (CV) : 1.4 (-1.544917e+17)
584 distinct values 0 (0.0%)
8 hs_pcb180_madj_Log2 [numeric]
Mean (sd) : 0 (1)
min ≤ med ≤ max:
-4.4 ≤ 0 ≤ 4.2
IQR (CV) : 1.3 (4.554361e+16)
744 distinct values 0 (0.0%)
9 hs_sumPCBs5_madj_Log2 [numeric]
Mean (sd) : 0 (1)
min ≤ med ≤ max:
-2 ≤ -0.1 ≤ 3.6
IQR (CV) : 1.3 (-3.403936e+15)
592 distinct values 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.2.2)
2023-01-09

Correlation Matrix for the Exposures:

cormat <- cor(X, use="complete.obs")
corrplot(cormat, type="upper", order="hclust",
         col=brewer.pal(n=8, name="RdYlBu"),
         title = "",
         addCoef.col = "black",
         tl.cex=.5, number.cex=.5)


Hierarchical clustering on Exposures:

#  hierarchical clustering
hc <- t(X) %>%
  dist(method = "euclidean") %>% # Compute dissimilarity matrix based on Euclidean space
  hclust(method = "ward.D2")     # Use complete linkage

# Visualize using factoextra
# Cut in groups and color by groups
fviz_dend(hc, k = 3, # Cut in groups
          show_labels = TRUE, cex=0.4,
          color_labels_by_k = TRUE, # color labels by groups
          rect = TRUE # Add rectangle around groups
          )

Exposome: Univariate Regression

if(univariate) {
  univariate.results <- t(sapply(1:P, FUN=function(p) {  # using index p facilitate write
    x <- X[,p]
    reg <- glm(Y~x+U, family=binomial)    # perform logistic regression
    s.reg <- summary(reg)                 # get the summary for the regression
    c.reg <- s.reg$coef["x",]             # select the coefficients for the exposure
    write.table(t(c(exposure.Names[p], c.reg)), file="ExposomeUnivariateResults.txt", append=ifelse(p==1, F, T), quote=F, sep="\t", col.names=ifelse(p==1, T, F), row.names=F)
    return(c.reg)                         # to avoid potential memory issues only return coefficients if small number of exposures
  }, simplify=T))
  univariate.results <- data.frame(exposure.Names,univariate.results)
}

Univariate results:

Univariate Summary Table:

if(univariate) { kable(univariate.results, digits=3, align="c", row.names=FALSE, col.names=c("Exposure","Estimate", "SD","Z statistic", "P-value"))}
Exposure Estimate SD Z statistic P-value
hs_dde_madj_Log2 0.120 0.075 1.592 0.111
hs_ddt_madj_Log2 0.016 0.075 0.208 0.835
hs_hcb_madj_Log2 -0.078 0.087 -0.903 0.367
hs_pcb118_madj_Log2 -0.150 0.120 -1.254 0.210
hs_pcb138_madj_Log2 -0.173 0.107 -1.617 0.106
hs_pcb153_madj_Log2 -0.210 0.125 -1.677 0.094
hs_pcb170_madj_Log2 -0.338 0.130 -2.607 0.009
hs_pcb180_madj_Log2 -0.138 0.126 -1.099 0.272
hs_sumPCBs5_madj_Log2 -0.380 0.125 -3.051 0.002

Univariate Manhattan Plot:

neglog.pvalues <- -log10(univariate.results$Pr...z..)
plot(1:nrow(univariate.results), neglog.pvalues, 
     pch=16, xaxt="n", ylim=c(0, max(neglog.pvalues, 3)),
     ylab="-log(p-value)", xlab="")
text(x=1:nrow(univariate.results), y=par("usr")[3]-0.1, xpd=NA,
     labels=univariate.results$exposure.Names, adj=.9, srt=45, cex=.75)
abline(h=-log10(0.05/nrow(univariate.results)), lty=2, lwd=2, col=2)


Exposome: Ridge regression

if(ridge) {
  ridge.cv <- cv.glmnet(x=X, y=Y, family="binomial", alpha=0)  # alpha=0 is for ridge
  ridge.coef <- coef(ridge.cv, s = "lambda.min")
  ridge.fit <- glmnet(x=X, y=Y, family="binomial", alpha=0)
}

Ridge Results:

Ridge Selection of \(\lambda\) via Cross Validation

if(ridge) { plot(ridge.cv) }

Ridge Coefficient Shrinkage

if(ridge) { 
  plot(ridge.fit, xvar="lambda", label=T)
  abline(v=log(ridge.cv$lambda.min), lty=2, col="red")
  abline(v=log(ridge.cv$lambda.1se), lty=2, col="green")
}

Ridge Coefficients for the Selected Model

if(ridge) { ridge.coef }
## 10 x 1 sparse Matrix of class "dgCMatrix"
##                                s1
## (Intercept)           -0.86512355
## hs_dde_madj_Log2       0.14168571
## hs_ddt_madj_Log2       0.07171705
## hs_hcb_madj_Log2       0.09740094
## hs_pcb118_madj_Log2   -0.03061129
## hs_pcb138_madj_Log2   -0.02064578
## hs_pcb153_madj_Log2    0.01217756
## hs_pcb170_madj_Log2   -0.07540445
## hs_pcb180_madj_Log2    0.03147135
## hs_sumPCBs5_madj_Log2 -0.10445286


Exposome: LASSO regression

if(lasso) {
  lasso.cv <- cv.glmnet(x=X, y=Y, family="binomial", alpha=1)  # alpha=1 is for lasso
  lasso.coef <- coef(lasso.cv, s = "lambda.min")
  lasso.fit <- glmnet(x=X, y=Y, family="binomial", alpha=1)
}

LASSO Results:

LASSO Selection of \(\lambda\) via Cross Validation

if(lasso) { plot(lasso.cv) }

LASSO Coefficient Shrinkage

if(lasso) { 
  plot(lasso.fit, xvar="lambda", label=T)
  abline(v=log(lasso.cv$lambda.min), lty=2, col="red")
  abline(v=log(lasso.cv$lambda.1se), lty=2, col="green")
}

LASSO Coefficients for the Selected Model

if(lasso) { lasso.coef }
## 10 x 1 sparse Matrix of class "dgCMatrix"
##                                s1
## (Intercept)           -0.86857898
## hs_dde_madj_Log2       0.17290757
## hs_ddt_madj_Log2       0.05854869
## hs_hcb_madj_Log2       0.11836118
## hs_pcb118_madj_Log2    .         
## hs_pcb138_madj_Log2    .         
## hs_pcb153_madj_Log2    .         
## hs_pcb170_madj_Log2   -0.04365484
## hs_pcb180_madj_Log2    .         
## hs_sumPCBs5_madj_Log2 -0.15671124


Exposome: Elastic net regression

if(elasticnet) {
  elasticnet.cv <- cv.glmnet(x=X, y=Y, family="binomial", alpha=0.5)  # alpha=0.5 is for elastic net
  elasticnet.coef <- coef(elasticnet.cv, s = "lambda.min")
  elasticnet.fit <- glmnet(x=X, y=Y, family="binomial", alpha=0.5)
}

Elastic net Results:

Elastic net Selection of \(\lambda\) via Cross Validation

if(elasticnet) { plot(elasticnet.cv) }

Elastic net Coefficient Shrinkage

if(elasticnet) { 
  plot(elasticnet.fit, xvar="lambda", label=T)
  abline(v=log(elasticnet.cv$lambda.min), lty=2, col="red")
  abline(v=log(elasticnet.cv$lambda.1se), lty=2, col="green")
}

Elastic net Coefficients for the Selected Model

if(elasticnet) { elasticnet.coef }
## 10 x 1 sparse Matrix of class "dgCMatrix"
##                                s1
## (Intercept)           -0.86592174
## hs_dde_madj_Log2       0.16260223
## hs_ddt_madj_Log2       0.05398904
## hs_hcb_madj_Log2       0.10562583
## hs_pcb118_madj_Log2    .         
## hs_pcb138_madj_Log2    .         
## hs_pcb153_madj_Log2    .         
## hs_pcb170_madj_Log2   -0.04773519
## hs_pcb180_madj_Log2    .         
## hs_sumPCBs5_madj_Log2 -0.13862298


Exposome: Bayesian stochastic feature selection

if(bayesian.selection) { 
  U <- U[,2:ncol(U)]
  reg.bas <- bas.glm(Y~X+U, family = binomial(link = "logit"),
                   betaprior = bic.prior(), modelprior=beta.binomial(1,P),
                   include.always = ~U)
  coef.bas <- coef(reg.bas, estimator="BMA")
  coef.r <- data.frame(c("Intercept", exposure.Names, names(as.data.frame(U))), coef.bas$postmean, coef.bas$postsd,coef.bas$probne0)
  names(coef.r) <- c("Variable", "Estimate", "Standard Deviation", "Pr(B!=0")
}

Bayesian stochastic feature selection Results:

Posterior Probability of Inclusion

if(bayesian.selection) { 
  plot(reg.bas, which=c(4))
}

Marginal Posterior Estimates

if(bayesian.selection) { 
  kable(coef.r, digits=3, align="c", row.names=FALSE)
}
Variable Estimate Standard Deviation Pr(B!=0
Intercept -2.314 0.594 1.000
hs_dde_madj_Log2 0.002 0.020 0.018
hs_ddt_madj_Log2 0.000 0.005 0.004
hs_hcb_madj_Log2 0.000 0.004 0.002
hs_pcb118_madj_Log2 0.000 0.013 0.007
hs_pcb138_madj_Log2 -0.004 0.035 0.019
hs_pcb153_madj_Log2 -0.006 0.045 0.021
hs_pcb170_madj_Log2 -0.085 0.178 0.204
hs_pcb180_madj_Log2 -0.014 0.071 0.046
hs_sumPCBs5_madj_Log2 -0.163 0.220 0.384
e3_sex_Nonefemale 0.000 0.135 1.000
e3_sex_Nonemale 0.000 0.000 1.000
h_age_None 0.033 0.017 1.000
h_cohort2 1.089 0.546 1.000
h_cohort3 0.829 0.276 1.000
h_cohort4 0.498 0.265 1.000
h_cohort5 0.126 0.354 1.000
h_cohort6 0.894 0.283 1.000
h_edumc_None2 0.074 0.228 1.000
h_edumc_None3 -0.319 0.229 1.000

4. An Example of a Single-layer Omic Analysis for Genomic Data and a Outcome

Genome Overview

Genomewide association studies have been extremely successful in identifying single nucleotide polymorphisms (SNPs) associated with traits and disease outcomes. By far, the single most prominent analysis technique for GWAS is to treat each SNP as independent and perform a genomewide scan with numerous univariate regression models. This brief tutorial performs this analysis and some summary results on a subset of SNPs simulated to accompanying the ISGlobal Exposome Data Challege dataset.

This is not a comprehensive example of a GWAS analysis and is designed to provide insight into the genomic data and provides a foundation for further analyses. Current techniques leveraging germline genetics include GxE analyses, polygenic risk scores, and the use of summary statistics for Mendelian randomization studies and TWAS (and related) studies that often leverage additional omic data.

# Outcome
outcome.Name <- "hs_bmi_c_cat" # "hs_asthma" # "hs_bmi_c_cat" "hs_zbmi_who"

# Covariates
covariate.Names <- c("h_mbmi_None","e3_sex_None","h_age_None","h_cohort","h_edumc_None","ethn_PC1","ethn_PC2") 

# SNPs
snp.Names <- paste("SNP", 1:1000, sep=".")

# Analysis models to run
univariate <- T

Genome: Processing the Data

load(paste0(work.dir, "/Data/HELIX.MultiAssayExperiment.RData")) # not recommended way of storing genomewide data

variables <- c(covariate.Names, "h_ethnicity_cauc", snp.Names)
d <- wideFormat(intersectColumns(helix_ma[variables, ,]), colDataCols=outcome.Name) # 1) select variables but keep in MultiAssayExperiment format; 2) intersectionColumns selects only individuals with complete data; 3) wideFormat returns as a DataFrame
## harmonizing input:
##   removing 7854 sampleMap rows not in names(experiments)
# Create  design matrix
X <- d[,paste0("genome_", snp.Names)]
names(X) <- snp.Names
X <- as.matrix(X)

# Create the outcome variable
Y <- d[,outcome.Name] # outcome
if(outcome.Name=="hs_bmi_c_cat") { Y <- ifelse(as.numeric(Y)>=3, 1, 0)}

# Create the covariate design matrix
U <- d[,c(paste0("covariates_", covariate.Names[1:5]), paste0("proteome.cov_", covariate.Names[6:7]))]
names(U) <- covariate.Names
U[,c("h_cohort","e3_sex_None","h_edumc_None")] <- lapply(U[,c("h_cohort","e3_sex_None","h_edumc_None")], factor)
U[,c("h_mbmi_None", "h_age_None","ethn_PC1","ethn_PC2")] <- lapply(U[,c("h_mbmi_None", "h_age_None","ethn_PC1","ethn_PC2")], as.numeric)
U <- model.matrix(as.formula(paste("~-1+", paste(covariate.Names, collapse="+"))), data=U) 


# Other variables for analysis
N <- nrow(d) # number of individuals in the analysis
Q <- ncol(U)  # number of covariates in the matrix U
P <- ncol(X)  # number of SNPs in the matrix X


Genome: Descriptive Statistics

  • The genome includes a total of 1000 single nucleotide polymorphisms (SNPs):

Plot of Genetic Ancestry as Estimated by Prinicpal Components

plot(d$proteome.cov_ethn_PC1, d$proteome.cov_ethn_PC2, pch=16, col=ifelse(d$proteome.cov_h_ethnicity_cauc=="yes", 1, 2),
     xlab="Component 1", ylab="Component 2")
legend(x="topleft", legend=c("Caucasian", "Other"), col=c(1,2), pch=16)

Correlation Matrix for Local Region of the Genome:

cormat <- round(cor(X[,1:(P/5)], use="complete.obs"), 2)
cormat[lower.tri(cormat)]<- NA
melted_cormat <- melt(cormat)
ggplot(data = melted_cormat, aes(Var2, Var1, fill = value))+
  geom_tile(color = "white")+
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1,1), space = "Lab", 
                       name="Pearson\nCorrelation") +
  theme_minimal()+
  theme(axis.text.x = element_blank(), axis.text.y = element_blank())+
  labs(y= "SNPs", x = "SNPs")+
  coord_fixed()


Genome: Univariate Regression

if(univariate) {
  univariate.results <- t(sapply(1:P, FUN=function(p) {  # using index p facilitate write
    x <- X[,p]
    reg <- glm(Y~x+U, family=binomial)    # perform logistic regression
    s.reg <- summary(reg)                 # get the summary for the regression
    c.reg <- s.reg$coef["x",]             # select the coefficients for the exposure
    write.table(t(c(snp.Names[p], c.reg)), file="GenomeUnivariateResults.txt", append=ifelse(p==1, F, T), quote=F, sep="\t", col.names=ifelse(p==1, T, F), row.names=F)
    return(c.reg)                         # to avoid potential memory issues only return coefficients if small number of exposures
  }, simplify=T))
  univariate.results <- data.frame(snp.Names,univariate.results)
  names(univariate.results) <- c("SNP.Name","Estimate", "SD","Z.statistic", "P.value")
  univariate.results$P.value <- format(univariate.results$P.value, scientific=T)
}

Univariate results:

Univariate Summary Table:

if(univariate) { kable(univariate.results[as.numeric(univariate.results$P.value)<0.05,], digits=3, align="c", row.names=FALSE, col.names=c("SNP","Estimate", "SD","Z Statistics", "P-value"))}
SNP Estimate SD Z Statistics P-value
SNP.9 0.329 0.137 2.410 1.594410e-02
SNP.10 0.321 0.112 2.864 4.179853e-03
SNP.14 0.230 0.113 2.029 4.244698e-02
SNP.15 0.341 0.152 2.248 2.456820e-02
SNP.16 0.278 0.098 2.825 4.724717e-03
SNP.17 0.238 0.098 2.433 1.496590e-02
SNP.20 0.219 0.096 2.278 2.270211e-02
SNP.21 0.193 0.097 1.992 4.639044e-02
SNP.44 0.308 0.125 2.463 1.377572e-02
SNP.46 0.226 0.111 2.040 4.130533e-02
SNP.52 0.449 0.165 2.720 6.522377e-03
SNP.62 0.211 0.104 2.038 4.153780e-02
SNP.100 0.278 0.075 3.697 2.182341e-04
SNP.125 -0.256 0.123 -2.071 3.831718e-02
SNP.132 -0.207 0.103 -2.002 4.529343e-02
SNP.183 0.417 0.079 5.278 1.302509e-07
SNP.228 0.281 0.100 2.807 5.003849e-03
SNP.259 0.328 0.161 2.033 4.200655e-02
SNP.278 0.370 0.076 4.874 1.095336e-06
SNP.285 0.242 0.101 2.405 1.615891e-02
SNP.297 0.524 0.081 6.484 8.937938e-11
SNP.365 -0.373 0.189 -1.974 4.837617e-02
SNP.436 -0.325 0.133 -2.455 1.409099e-02
SNP.502 0.345 0.077 4.475 7.647719e-06
SNP.532 0.257 0.119 2.163 3.057012e-02
SNP.573 0.950 0.099 9.585 9.253401e-22
SNP.575 0.292 0.124 2.356 1.848794e-02
SNP.602 -0.235 0.101 -2.319 2.037771e-02
SNP.626 0.367 0.142 2.591 9.577285e-03
SNP.632 0.206 0.101 2.030 4.235610e-02
SNP.645 0.239 0.109 2.184 2.894422e-02
SNP.651 0.496 0.087 5.677 1.373821e-08
SNP.655 -0.331 0.160 -2.074 3.807018e-02
SNP.691 -0.267 0.126 -2.118 3.420752e-02
SNP.703 -0.243 0.101 -2.409 1.598263e-02
SNP.743 0.194 0.098 1.968 4.908090e-02
SNP.749 0.319 0.119 2.684 7.283509e-03
SNP.750 0.228 0.096 2.363 1.812390e-02
SNP.757 -0.192 0.098 -1.964 4.953584e-02
SNP.761 -0.371 0.171 -2.177 2.951590e-02
SNP.784 0.219 0.100 2.179 2.930437e-02
SNP.787 0.213 0.102 2.082 3.731017e-02
SNP.802 0.509 0.195 2.610 9.054696e-03
SNP.816 0.249 0.107 2.325 2.006790e-02
SNP.830 0.312 0.158 1.975 4.824190e-02
SNP.874 -0.490 0.173 -2.825 4.730599e-03
SNP.893 -0.212 0.099 -2.146 3.189596e-02
SNP.895 -0.227 0.105 -2.161 3.068424e-02
SNP.896 -0.277 0.124 -2.231 2.569164e-02
SNP.900 0.201 0.076 2.635 8.408207e-03
SNP.942 0.330 0.081 4.073 4.633300e-05
SNP.943 0.228 0.093 2.457 1.399408e-02
SNP.944 0.356 0.122 2.915 3.559579e-03
SNP.945 0.236 0.107 2.200 2.782052e-02
SNP.947 0.247 0.104 2.361 1.824135e-02
SNP.961 -0.205 0.103 -1.999 4.565292e-02
SNP.982 -0.315 0.130 -2.428 1.518755e-02
SNP.983 -0.352 0.177 -1.985 4.709611e-02

Univariate Manhattan Plot:

neglog.pvalues <- -log10(as.numeric(univariate.results$P.value))
plot(1:nrow(univariate.results), neglog.pvalues, 
     pch=16, xaxt="n", ylim=c(0, max(neglog.pvalues, 3)),
     ylab="-log(p-value)", xlab="SNPs")
abline(h=-log10(0.05/nrow(univariate.results)), lty=2, lwd=2, col=2)

Univariate QQ-Plot:

pvalues <- as.numeric(univariate.results$P.value)
r <- gcontrol2(pvalues, pch=16)
lambda <- round(r$lambda,3)
text(x=1, y=5, labels=bquote(lambda == .(lambda)), cex=2)



5. An Example of a Two-layer Omic Analysis With Both Layers High-Dimensional

Exposome and Proteome: Overview

When there are two or more high dimensional layers of omic data there are several ways to approach the analysis.

  1. “Late” Integration: Perform all pairwise associations via regression and then explore the results with post-processing and/or analyses:
    1. Explore the results leveraging a priori knowledge or annotation for each layer to look for patterns of biological overlap or interest. Such analyses may be examining certain pathways or genes that can be annotated across omic data to see if there are consistent resutls.
    1. Perform formal pathway analysis on the pairwise results. Such pathway analyses are common in gene expression or genomic studies and look for over-representation of significant (or noteworthy) within certain pathways. Such analyses are very dependent upon accurately representing the null distribution of all pathways or genes and as a results are often very omic-specific in their implementation.
    1. Perform a data-driven analysis of the pairwise results to investigate relationships. Such analyses include hierarchical clustering, principal components, or network analysis.
  1. “Early” Integration: Perform first clustering or dimension reduction:
    1. This can be performed on each omic layer independently and then commonalities are explored or clusters/components across omic layers are integrated.
    1. Append/concatenate omic layers into a single data set and perform clustering or dimension reduction across all layers simultaneously. This approach often comes with certain assumptions about the exchangeability across omic layers.
  1. “Mixed” Integration: Alternative approaches that simultaneously identify clusters/components while also investigating relationship between omic layers. An example of this approach is partial least squares regression in which identifies components that maximum the variance between multiple outcome variables and numerous independent variables.

In this example, we investigate approach #1 in which all pairwise regressions are performed between two omic layers. We then explore results visually (#1.1 above) and through data-driven approaches (#1.3 above). We note that in this discussion and example, we do not have a specific and defined outcome of interest that will are ultimately interested in exploring. The link of multi-omic data to a specific outcome is what we are primarily exploring in the workshop lectures and labs.

codebook <- read.table(paste0(work.dir, "/Data/codebook.txt"), sep="\t", header=T)

# Covariates
covariate.Names <- c("e3_sex_None","h_cohort", "age_sample_years","ethn_PC1","ethn_PC2","hs_dift_mealblood_imp","blood_sam4")

# Exposure related
exposure.group <- "Organochlorines" #Organochlorines" # {"Metals", "Organochlorines", "Organophosphate pesticides", "PBDE", "PFAS", "Phenols", "Phthalates", "All"}

if(exposure.group=="All") { exposure.Names <- as.character(codebook$variable_name[codebook$domain=="Chemicals"]) }
if(exposure.group!="All") { exposure.Names <- as.character(codebook$variable_name[codebook$family==exposure.group]) }
exposure.Names <- exposure.Names[grep("madj", exposure.Names)] # select only children measures

# Proteome
proteome.Names <- c("Adiponectin","CRP","APO.A1","APO.B","APO.E","IL1beta","IL6","MCP1","Leptin","HGF","INSULIN","TNFalfa","BAFF","Cpeptide","PAI1","IL8","FGFBasic","GCSF","IL10","IL13","IL12","Eotaxin","IL17","MIP1alfa","MIP1beta","IL15","EGF","IL5","IFNgamma","IFNalfa","IL1RA","IL2","IP10","IL2R","MIG","IL4")

# Analysis models to run
univariate <- T

Exposome and Proteome: Processing the Data

load(paste0(work.dir, "/Data/HELIX.MultiAssayExperiment.RData"))

variables <- c(covariate.Names, exposure.Names, proteome.Names)
d <- wideFormat(intersectColumns(helix_ma[variables, ,])) # 1) select variables but keep in MultiAssayExperiment format; 2) intersectionColumns selects only individuals with complete data; 3) wideFormat returns as a DataFrame
## harmonizing input:
##   removing 6732 sampleMap rows not in names(experiments)
# Create  design matrix
X <- as.data.frame(d[,paste("proteome",proteome.Names,sep="_")])
names(X) <- proteome.Names
X <- scale(X, center=T, scale=T)

# Create exposure design matrix
W <- as.data.frame(apply(d[,paste("exposome",exposure.Names,sep="_")],2,as.numeric))
names(W) <- exposure.Names
W <- scale(W, center=T, scale=T)

# Create the covariate design matrix
U <- d[,c(paste0("covariates_", covariate.Names[1:2]), paste0("metabol_urine.cov_", covariate.Names[3:7]))]
names(U) <- covariate.Names
U[,c("h_cohort","e3_sex_None")] <- lapply(U[,c("h_cohort","e3_sex_None")], factor)
U[,c("age_sample_years","ethn_PC1","ethn_PC2","hs_dift_mealblood_imp","blood_sam4")] <- lapply(U[,c("age_sample_years","ethn_PC1","ethn_PC2","hs_dift_mealblood_imp","blood_sam4")], as.numeric)
U <- model.matrix(as.formula(paste("~-1+", paste(covariate.Names, collapse="+"))), data=U) 

# Other variables for analysis
N <- nrow(d) # number of individuals in the analysis
Q <- ncol(U)  # number of covariates in the matrix U
P <- ncol(X)  # number of proteome features in the matrix X
R <- ncol(W)  # number of exposome features in the matrix X


Exposome and Proteome: Pairwise Univariate Regression Between Exposures and Proteins

if(univariate) {
  univariate.results <- {}
  beta.results <- matrix(0, nrow=R, ncol=P)
  p.results <- matrix(0, nrow=R, ncol=P)
  for(r in 1:R) { # loop through exposures
    w <- W[,r]
    for(p in 1:P) { # loop through proteins
      x <- X[,p]
      reg <- glm(x~w+U, family=gaussian)
      s.reg <- summary(reg)                 # get the summary for the regression
      c.reg <- s.reg$coef["w",]             # select the coefficients for the exposure
      r.reg <- c(exposure.Names[r], proteome.Names[p], c.reg)
      write.table(t(r.reg), file="ExposomeProteomeUnivariateResults.txt", append=ifelse(p*r==1, F, T), quote=F, sep="\t", col.names=ifelse(p*r==1, T, F), row.names=F)
      beta.results[r,p] <- as.numeric(r.reg["Estimate"])
      p.results[r,p] <- as.numeric(r.reg["Pr(>|t|)"])
      univariate.results <- rbind(univariate.results, r.reg)
    }
  }
  univariate.results <- as.data.frame(univariate.results)
  names(univariate.results) <- c("Exposure", "Proteome", names(univariate.results)[3:6])
  beta.results <- as.data.frame(beta.results)
  p.results <- as.data.frame(p.results)
  names(beta.results) <- proteome.Names
  names(p.results) <- proteome.Names
  row.names(beta.results) <- exposure.Names
  row.names(p.results) <- exposure.Names
}

beta.results.long <- melt(as.matrix(beta.results))
names(beta.results.long) <- c("Exposure", "Protein", "Effect")

beta.pca <- prcomp(beta.results, scale = TRUE)

Exposome and Proteome: Visualization of Univariate results:

Univariate Summary Table:

if(univariate) { kable(univariate.results[univariate.results[,"Pr(>|t|)"] <0.05,], digits=3, align="c", row.names=FALSE, col.names=c("Exposure", "Protein","Estimate", "SD","t Value", "P Value"))}
Exposure Protein Estimate SD t Value P Value
hs_ddt_madj_Log2 PAI1 0.0655980263911758 0.0222916816296171 2.94271322734222 0.00332119390441221
hs_ddt_madj_Log2 FGFBasic -0.0591217726983932 0.0288577947729325 -2.04872801832547 0.040723688453088
hs_hcb_madj_Log2 APO.A1 0.0808454866534083 0.0371193241724781 2.17798918638047 0.0296169818286333
hs_hcb_madj_Log2 FGFBasic -0.0681603743455859 0.0340517970716978 -2.00166746565742 0.0455637497746824
hs_hcb_madj_Log2 IL10 -0.111039202752254 0.0367216519466496 -3.02380739607181 0.0025535488315567
hs_pcb138_madj_Log2 IL1beta -0.0807185736973553 0.0393129713954478 -2.05323003660573 0.0402844267333609
hs_pcb170_madj_Log2 CRP -0.131402752930642 0.050864664804739 -2.58337990499054 0.00991083721192663
hs_pcb170_madj_Log2 IL1beta -0.131460818837491 0.0481399976222637 -2.73080235418818 0.00641819810301397
hs_pcb170_madj_Log2 IL6 -0.107979982767619 0.0487130259305386 -2.21665521089966 0.0268489486073114
hs_pcb170_madj_Log2 MIP1beta 0.0893370838758477 0.0426613014618386 2.09410123026278 0.0364775260138151
hs_pcb180_madj_Log2 APO.A1 0.110222789580892 0.0465039907207924 2.37017915822888 0.0179497811573896
hs_sumPCBs5_madj_Log2 CRP -0.129142690409208 0.0506317312054623 -2.55062758737501 0.0108863794381994
hs_sumPCBs5_madj_Log2 IL1beta -0.0998494637762745 0.0479831741840394 -2.08092660550763 0.0376696679065281
hs_sumPCBs5_madj_Log2 IL6 -0.101482269912699 0.0484979388201711 -2.09250686485858 0.0366200626606123

Univariate Manhattan Plot:

neglog.pvalues <- -log10(as.numeric(univariate.results[,"Pr(>|t|)"]))
plot(1:nrow(univariate.results), neglog.pvalues, 
     pch=16, xaxt="n", ylim=c(0, max(neglog.pvalues, 3)),
     ylab="-log(p-value)", xlab="",
     col=match(univariate.results$Exposure, exposure.Names))
abline(h=-log10(0.05/nrow(univariate.results)), lty=2, lwd=2, col=2)
axis(side=1, at=(1:R)*(P)-P*.5, labels=FALSE)
text(x=(1:R)*(P), y=par("usr")[3]-0.1, xpd=NA,
      labels=exposure.Names, adj=1.2, srt=45, cex=.6)

Effects by Exposure and Protein

ggplot(beta.results.long, 
       aes(fill=Exposure, y = Effect, x = Protein)) + 
  geom_bar(position="dodge", stat="identity") + 
  ggtitle("Title") +
  facet_wrap(~Protein) +
  facet_grid(rows = vars(Exposure)) + 
  xlab("") +
  ylab("Effect") + 
  theme(text = element_text(size=1),
        axis.text.x = element_text(angle = 45, vjust = 1, 
                                   hjust = 1, size=7),
        axis.text.y = element_text(size=10),
        legend.title = element_blank(),
        legend.text = element_text(size=10))

Exposome and Proteome: Data-driven Analysis of Univariate results:

Here, we examine the pairwise results (i.e. the effect estimates of \(\beta\)s from the regression of each protein on each exposure) by treating the estimates as the data and preforming hierarchical clustering and principal component analysis, as examples.

Heatmap:

heatmap.2(x=as.matrix(beta.results), hclustfun=function(d) hclust(d, method = "ward.D2"), trace="none", cexRow =.5, cexCol = .5)

PCA: Scree Plot

fviz_eig(beta.pca)

PCA: Proteins

fviz_pca_var(beta.pca,
             title="PCA by Protein Contribution",
             col.var = "contrib", # Color by proportional amount to the PC
             gradient.cols = c("green", "blue", "red"),
             repel = TRUE     # Avoid text overlapping
             )

PCA: Exposures

fviz_pca_ind(beta.pca,
             title="PCA by Exposure Contribution",
             col.ind = "cos2", # Color by total PC amount for each "individual"
             gradient.cols = c("green", "blue", "red"),
             repel = TRUE     # Avoid text overlapping
             )